May 15, 2022
Vision Transformer mainly classifies the main object within an image while the one you mentioned is an OCR task where texts are placed at different positions. But of course, transformer can be modified for OCR task. Hope I have answered your questions. :)