You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Is pixels has similar use case as sub tokens has?
17
19
- Now if you remember, You need to have one question in your mind i.e. in LLM we learned the concept of subtoken is used when LLM need to understand the meaning of complex or unseen word
18
20
- But in VLM, The concept of pixels are not used to understand the strcture or color of the complex image.
@@ -31,7 +33,9 @@ To understand this article, Now we need to know:
31
33
- The location of object is determined by using mathematical concept like sin and cosine.
32
34
- Every position and color of object in a image is different so, the vector number are always unique.
In this step, When user upload his/ her image in VLM then CLS token is created where all the information of the image or patches has been stored in vector form like which patch contain
37
41
which shapes or color etc.
@@ -42,18 +46,23 @@ which shapes or color etc.
42
46
- Self Attention Layer helps VLM to compare the patches and identify which patch is the another part of patch.
43
47
- Feed- Forward Layer helps self attention layer to have deep research according to the information stored in CLS token.
44
48
- The patches which has less difference in vector number are considered as highly related to each other like this transformer can known the relation between patches.
- Images are 2D model so position of each patch matter.
49
53
- If you are thinking this information is also stored in CLS token then you are thinking wrong.
50
54
- Positional Encoding will automatically identify the position of patch as it use mathmatical concept like sin and cosin with which it will automatically detect the position of patch.
51
55
- Remember, Vector number is generated according to the position of the patch so by the help of this vector number Positional Encoding automatically detect the position of patch.
0 commit comments