Vision Transformer mainly classifies the main object within an image while the one you mentioned is…

Do you think it's a good idea to use transformer to learn the positions of information in documents…
1
Daniel Tung
Sik-Ho Tsang
·Follow
May 15, 2022
--
Vision Transformer mainly classifies the main object within an image while the one you mentioned is an OCR task where texts are placed at different positions. But of course, transformer can be modified for OCR task. Hope I have answered your questions. :)
--
--
Written by Sik-Ho Tsang27K Followers
·71 Following
PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.
Responses (1)
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams