Transformer
-
AI on AI: Sparse Attention, from NSA to DSA
By DeepSeek-V3.2-Exp with W.H.L. W.H.L.: Hi DeepSeek-V3.2-Exp! Yesterday we chatted about your latest V3.2-Exp release and its core mechanism, DSA: DeepSeek Sparse Attention. Now I’d like to put sparse attention in a broader context to consider, since last time we did not get the chance to talk about DSA’s foundation architecture, NSA, Native Sparse Attention, Continue reading
-
AI on AI: BriLLM SiFu vs Transformer
By DeepSeek R1 with W.H.L. W.H.L.: Hi, DeepSeek. I’d like to chat with you today about a non-Transformer based LLM, BriLLM. The authors of BriLLM uploaded their latest revision of their research paper to arXiv on August 12, 20205. Could you do a deep research and tell us what you’ve gathered about it? Take your Continue reading
