Поддерживать
www.wikidata.ru-ru.nina.az
Bolshaya yazykovaya model BYaM kalka s angl large language model LLM eto yazykovaya model sostoyashaya iz nejronnoj seti so mnozhestvom parametrov obychno milliardy vesovyh koefficientov i bolee obuchennoj na bolshom kolichestve nerazmechennogo teksta s ispolzovaniem obucheniya bez uchitelya BYaM poyavilis primerno v 2018 godu i horosho spravlyayutsya s shirokim spektrom zadach Eto smestilo fokus issledovanij obrabotki estestvennogo yazyka s predydushej paradigmy obucheniya specializirovannyh kontroliruemyh modelej dlya konkretnyh zadach HarakteristikiHotya termin bolshaya yazykovaya model ne imeet formalnogo opredeleniya on obychno otnositsya k modelyam glubokogo obucheniya imeyushim kolichestvo parametrov poryadka milliardov i bolee BYaM eto modeli obshego naznacheniya kotorye prevoshodno spravlyayutsya s shirokim spektrom zadach v otlichie ot obucheniya odnoj konkretnoj zadache naprimer analizu nastroenij raspoznavaniyu imenovannyh sushnostej ili matematicheskim rassuzhdeniyam Nesmotrya na obuchenie na prostyh zadachah takih kak predskazanie sleduyushego slova v predlozhenii nejronnye yazykovye modeli s dostatochnym obucheniem i podschyotom parametrov shvatyvayut bolshuyu chast sintaksisa i semantiki chelovecheskogo yazyka Krome togo bolshie yazykovye modeli demonstriruyut znachitelnye obshie znaniya o mire i sposobny zapominat bolshoe kolichestvo faktov vo vremya obucheniya Arhitektura i obuchenieV BYaM chashe vsego ispolzovalas arhitektura transformera kotoraya s 2018 goda stala standartnoj tehnikoj glubokogo obucheniya dlya posledovatelnyh dannyh ranee naibolee rasprostranyonnymi byli rekurrentnye arhitektury takie kak modeli s dolgoj kratkovremennoj pamyatyu BYaM obuchayutsya bez uchitelya na neannotirovannom tekste Transformer pri generacii vyvoda sleva napravo obuchaetsya maksimizirovat veroyatnost naznachennuyu sleduyushemu slovu v obuchayushih dannyh s uchyotom predydushego konteksta V kachestve alternativy BYaM mozhet ispolzovat dvunapravlennyj transformer kak v primere kotoryj prisvaivaet raspredelenie veroyatnostej po slovam imeyushim dostup kak k predydushemu tak i k posleduyushemu kontekstu V dopolnenie k zadache prognozirovaniya sleduyushego slova ili BYaM mogut byt obucheny vspomogatelnym zadacham kotorye proveryayut ih ponimanie raspredeleniya dannyh takih kak prognozirovanie sleduyushego predlozheniya NSP v kotorom predstavleny pary predlozhenij i model dolzhna predskazat poyavyatsya li oni ryadom v obuchayushem korpuse tekstov Samye rannie BYaM obuchalis na korpusah soderzhashih poryadka milliardov slov Pervonachalnaya versiya GPT byla obuchena v 2018 godu na sostoyashem iz 985 millionov slov V tom zhe godu proshyol obuchenie na sochetanii BookCorpus i anglijskoj Vikipedii chto sootvetstvovalo 3 3 milliarda slov S teh por uchebnye korpusa dlya BYaM uvelichilis na poryadki dostignuv soten milliardov ili trillionov tokenov Obuchenie BYaM trebuet bolshih vychislitelnyh resursov Issledovanie 2020 goda ocenilo stoimost obucheniya modeli s 1 5 milliardami parametrov na 1 2 poryadka menshe chem uroven tehniki v to vremya v 1 6 milliona dollarov Analiz 2020 goda pokazal chto vozmozhnosti nejronnyh yazykovyh modelej izmeryaemye funkciej poter pri obuchenii plavno uvelichivalis po stepennomu zakonu ot kolichestva parametrov kolichestva obuchayushih dannyh i vychislenij ispolzuemyh dlya obucheniya Eti svyazi byli provereny v shirokom diapazone znachenij do semi poryadkov i ne nablyudalos zatuhaniya otnosheniya na verhnem konce diapazona v tom chisle dlya razmerov seti do trillionov parametrov Primenenie k posleduyushim zadachamV period s 2018 po 2020 god standartnyj metod ispolzovaniya BYaM dlya konkretnoj zadachi NLP zaklyuchalsya v modeli s dopolnitelnym obucheniem dlya konkretnoj zadachi Vposledstvii obnaruzhilos chto bolee moshnye BYaM takie kak GPT 3 mogut reshat zadachi bez dopolnitelnogo obucheniya s pomoshyu metodov podskazki v kotoryh reshaemaya zadacha predstavlyaetsya modeli v vide tekstovoj podskazki vozmozhno s nekotorymi tekstovymi primerami podobnyh zadach i ih reshenij Tonkaya nastrojka Tonkaya nastrojka eto praktika modifikacii sushestvuyushej predvaritelno obuchennoj yazykovoj modeli putyom eyo obucheniya pod nablyudeniem konkretnoj zadache naprimer analiz nastroenij raspoznavanie imenovannyh obektov ili markirovka chastej rechi Eto forma Obychno eto vklyuchaet vvedenie novogo nabora vesov svyazyvayushih poslednij sloj yazykovoj modeli s vyhodnymi dannymi posleduyushej zadachi Ishodnye vesa yazykovoj modeli mogut byt zamorozheny tak chto vo vremya obucheniya izuchaetsya tolko novyj sloj vesov soedinyayushij ih s vyhodnymi dannymi V kachestve alternativy ishodnye vesa mogut poluchat nebolshie obnovleniya vozmozhno s zamorozhennymi bolee rannimi sloyami Podskazka V paradigme podskazok populyarizirovannoj GPT 3 reshaemaya problema formuliruetsya s pomoshyu tekstovoj podskazki kotoruyu model dolzhna reshit predostavlyaya zavershenie posredstvom statisticheskogo vyvoda V podskazke s neskolkimi vystrelami podskazka vklyuchaet nebolshoe kolichestvo primerov pohozhih par zadacha i reshenie Naprimer zadacha analiza tonalnosti markirovki tonalnosti recenzii na film mozhno vyzvat sleduyushim obrazom Review This movie stinks Sentiment negative Review This movie is fantastic Sentiment Esli model vydayot polozhitelno znachit ona pravilno reshila zadachu V podskazkah s nulevym vystrelom primery resheniya ne predostavlyayutsya Primerom s nulevoj podskazkoj dlya zadachi s otvetom na vopros mozhet byt Kto napisal knigu Proishozhdenie vidov Bylo pokazano chto malaya proizvoditelnost BYaM pozvolyaet dostigat konkurentosposobnyh rezultatov v zadachah obrabotki estestvennogo yazyka inogda prevoshodya predydushie sovremennye podhody k tochnoj nastrojke Primerami takih zadach NLP yavlyayutsya perevod otvety na voprosy zadachi s rasshifrovka slov i ispolzovanie novogo slova v predlozhenii Sozdanie i optimizaciya takih podskazok nazyvaetsya inzhiniringom podskazok i v nastoyashee vremya yavlyaetsya aktivnoj oblastyu issledovanij Spisok bolshih yazykovyh modelejSpisok bolshih yazykovyh modelej Nazvanie Data reliza Razrabotchik Chislo parametrov Razmer korpusa tekstov Licenziya Primechaniya angl 2018 Google 340 millionov 3 3 milliarda slov Apache 2 0 Rannyaya i vliyatelnaya yazykovaya model angl 2019 OpenAI 1 5 milliarda 40GB 10 milliardov tokenov MIT Universalnaya model na baze transformennoj arhitektury GPT 3 2020 OpenAI 175 milliardov 499 milliardov tokenov Obshedostupnyj veb API Dorabotannyj variant GPT 3 poluchivshij nazvanie GPT 3 5 stal obshedostupnym cherez veb interfejs pod nazvaniem ChatGPT v 2022 godu GPT Neo Mart 2021 angl 2 7 milliarda 825 GiB MIT Pervyj iz vypushennyh EleutherAI GPT Neo prevzoshyol model GPT 3 analogichnogo razmera v nekotoryh testah no byl znachitelno huzhe chem samaya bolshaya GPT 3 GPT J Iyun 2021 angl 6 milliardov 825 GiB Apache 2 0 Yazykovaya model v stile GPT 3 Claude Dekabr 2021 Anthropic 52 milliarda 400 milliardov tokenov Zakrytaya beta Fine tuned for desirable behavior in conversations GLaM Generalist Language Model Dekabr 2021 Google 1 2 trilliona 1 6 trilliona tokenov Proprietary Model s razrezhennoj smesyu ekspertov angl sparse mixture of experts mode chto delaet eyo bolee dorogoj dlya obucheniya no bolee deshyovoj dlya vypolneniya logicheskogo vyvoda po sravneniyu s GPT 3 LaMDA Language Models for Dialog Applications Yanvar 2022 Google 137 milliardov 1 56T slov Proprietary Specializiruetsya na generacii otvetov v razgovorah Megatron Turing NLG Oktyabr 2021 Microsoft and Nvidia 530 milliardov 338 6 milliarda tokenov Restricted web access Standartnaya arhitektura no obuchennaya na superkompyuternom klastere GPT NeoX Fevral 2022 angl 20 milliardov 825 GiB Apache 2 0 Na baze arhitektury Megatron Chinchilla Mart 2022 DeepMind 70 milliardov 1 3 trilliona tokenov Proprietary Model s umenshennym kolichestvom parametrov obuchennaya na bolshem kolichestve dannyh PaLM Pathways Language Model Aprel 2022 Google 540 milliardov 768 milliardov tokenov Proprietary Napravlena na dostizhenie prakticheskih predelov masshtaba modeli OPT Open Pretrained Transformer Maj 2022 Meta 175 milliardov 180 milliardov tokenov Nekommercheskoe issledovanie Arhitektura GPT 3 s nekotorymi adaptaciyami Megatron YaLM 100B Iyun 2022 Yandeks 100 milliardov 300 milliardov tokenov Apache 2 0 75 teksta ispolzovannogo pri obuchenii napisano na russkom yazyke BLOOM Iyul 2022 Kollaboraciya pod upravleniem Hugging Face 175 milliardov 350 milliardov tokenov 1 6TB Responsible AI Po suti GPT 3 no obuchena mnogoyazychnomu korpusu tekstov 30 anglijskij isklyuchaya yazyki programmirovaniya AlexaTM Teacher Models Noyabr 2022 Amazon 20 milliardov 1 3 trilliona Obshedostupnyj veb API Dvunapravlennaya arhitektura posledovatelnost k posledovatelnosti LLaMA Large Language Model Meta AI Fevral 2023 Meta 65 milliardov 1 4 trilliona Nekommercheskoe issledovanie Obuchena na bolshom korpuse iz 20 yazykov chtoby dobitsya luchshej proizvoditelnosti s menshim kolichestvom parametrov GPT 4 Mart 2023 OpenAI Net dannyh Net dannyh Obshedostupnyj veb API Dostupna dlya polzovatelej ChatGPT Plus Microsoft podtverdila chto v Bing Chat ispolzuetsya model GPT 4 StableLM Aprel 2023 Stability AI 7 milliardov 800 milliardov Ishodnyj kod Apache 2 0PrimechaniyaKommentarii Eto data pervogo poyavleniya dokumentacii opisyvayushej arhitekturu modeli Vo mnogih sluchayah issledovateli publikuyut ili soobshayut o neskolkih versiyah modeli imeyushih raznye razmery V etih sluchayah zdes ukazyvaetsya razmer samoj bolshoj modeli Eto licenziya vesov predvaritelno obuchennoj modeli Pochti vo vseh sluchayah sam obuchayushij kod imeet otkrytyj ishodnyj kod ili mozhet byt legko vosproizvedyon Menshie modeli vklyuchaya 66B obshedostupny a model 175B dostupna po zaprosu Licenziya Facebook i shema rasprostraneniya ogranichivali dostup k utverzhdennym issledovatelyam no vesa modelej utekli i stali shiroko dostupny Kak ukazano v tehnicheskom otchyote Uchityvaya konkurentnuyu sredu i posledstviya dlya bezopasnosti krupnomasshtabnyh modelej takih kak GPT 4 etot otchyot ne soderzhit dopolnitelnyh svedenij ob arhitekture vklyuchaya razmer modeli oborudovanii obuchayushih vychisleniyah postroenii nabora dannyh metode obucheniya ili podobnuyu informaciyu Istochniki 2022 Human Language Understanding amp Reasoning Daedalus 17 noyabrya 2023 Data obrasheniya 16 marta 2023 Carlini Nicholas Tramer Florian Wallace Eric Jagielski Matthew Herbert Voss Ariel Lee Katherine Roberts Adam Brown Tom B Song Dawn Erlingsson Ulfar 2021 Extracting Training Data from Large Language Models PDF USENIX Security Symposium Vol 6 PDF 21 dekabrya 2023 Data obrasheniya 16 marta 2023 Wei Jason Emergent Abilities of Large Language Models neopr Data obrasheniya 16 marta 2023 16 marta 2023 goda Jurafsky Dan Speech and Language Processing Dan Jurafsky James H Martin 3rd edition draft 7 January 2023 ot 23 marta 2023 na Wayback Machine Wiggers Kyle The emerging types of language models and why they matter neopr TechCrunch 28 aprelya 2022 Data obrasheniya 16 marta 2023 16 marta 2023 goda Ananthaswamy Anil In AI is bigger always better neopr Nature 8 marta 2023 Data obrasheniya 16 marta 2023 16 marta 2023 goda Kaplan Jared McCandlish Sam Henighan Tom Brown Tom B Chess Benjamin Child Rewon Gray Scott Radford Alec Wu Jeffrey Amodei Dario 2020 Scaling Laws for Neural Language Models CoRR abs 2001 08361 arXiv 2001 08361 15 marta 2023 Data obrasheniya 16 marta 2023 Brown Tom B Mann Benjamin Ryder Nick Subbiah Melanie Kaplan Jared Dhariwal Prafulla Neelakantan Arvind Shyam Pranav Sastry Girish Askell Amanda Agarwal Sandhini Herbert Voss Ariel Krueger Gretchen Henighan Tom Child Rewon Ramesh Aditya Ziegler Daniel M Wu Jeffrey Winter Clemens Hesse Christopher Chen Mark Sigler Eric Litwin Mateusz Gray Scott Chess Benjamin Clark Jack Berner Christopher McCandlish Sam Radford Alec Sutskever Ilya Amodei Dario Dec 2020 Larochelle H Ranzato M Hadsell R Balcan M F Lin H eds Language Models are Few Shot Learners PDF Advances in Neural Information Processing Systems 33 Curran Associates Inc 1877 1901 PDF 17 noyabrya 2023 Data obrasheniya 16 marta 2023 Devlin Jacob Chang Ming Wei Lee Kenton Toutanova Kristina 11 October 2018 BERT Pre training of Deep Bidirectional Transformers for Language Understanding arXiv 1810 04805v2 cs CL BERT neopr 13 marta 2023 Data obrasheniya 16 marta 2023 13 yanvarya 2021 goda GPT 2 1 5B Release angl OpenAI 5 noyabrya 2019 Data obrasheniya 14 noyabrya 2019 14 noyabrya 2019 goda Better language models and their implications neopr openai com Data obrasheniya 16 marta 2023 16 marta 2023 goda OpenAI s GPT 3 Language Model A Technical Overview angl lambdalabs com Data obrasheniya 16 marta 2023 27 marta 2023 goda gpt 2 neopr GitHub Data obrasheniya 13 marta 2023 11 marta 2023 goda ChatGPT Optimizing Language Models for Dialogue angl OpenAI 30 noyabrya 2022 Data obrasheniya 13 yanvarya 2023 30 noyabrya 2022 goda GPT Neo neopr 15 marta 2023 Data obrasheniya 16 marta 2023 12 marta 2023 goda Gao Leo Biderman Stella Black Sid Golding Laurence Hoppe Travis Foster Charles Phang Jason He Horace Thite Anish Nabeshima Noa Presser Shawn Leahy Connor 31 December 2020 The Pile An 800GB Dataset of Diverse Text for Language Modeling arXiv 2101 00027 Iyer Abhishek GPT 3 s free alternative GPT Neo is something to be excited about neopr VentureBeat 15 maya 2021 Data obrasheniya 16 marta 2023 9 marta 2023 goda angl www forefront ai Data obrasheniya 28 fevralya 2023 Arhivirovano iz originala 9 marta 2023 goda Product angl Anthropic Data obrasheniya 14 marta 2023 16 marta 2023 goda Askell Amanda Bai Yuntao Chen Anna et al 9 December 2021 A General Language Assistant as a Laboratory for Alignment arXiv 2112 00861 Bai Yuntao Kadavath Saurav Kundu Sandipan et al 15 December 2022 Constitutional AI Harmlessness from AI Feedback arXiv 2212 08073 Dai Andrew M Du Nan More Efficient In Context Learning with GLaM angl ai googleblog com 9 dekabrya 2021 Data obrasheniya 9 marta 2023 12 marta 2023 goda Cheng Heng Tze Thoppilan Romal LaMDA Towards Safe Grounded and High Quality Dialog Models for Everything angl ai googleblog com 21 yanvarya 2022 Data obrasheniya 9 marta 2023 25 marta 2022 goda Alvi Ali Kharya Paresh Using DeepSpeed and Megatron to Train Megatron Turing NLG 530B the World s Largest and Most Powerful Generative Language Model neopr Microsoft Research 11 oktyabrya 2021 Data obrasheniya 16 marta 2023 13 marta 2023 goda Smith Shaden Patwary Mostofa Norick Brandon LeGresley Patrick Rajbhandari Samyam Casper Jared Liu Zhun Prabhumoye Shrimai Zerveas George Korthikanti Vijay Zhang Elton Child Rewon Aminabadi Reza Yazdani Bernauer Julie Song Xia 2022 02 04 Using DeepSpeed and Megatron to Train Megatron Turing NLG 530B A Large Scale Generative Language Model arXiv 2201 11990 ot 15 aprelya 2023 na Wayback Machine Black Sidney Biderman Stella Hallahan Eric et al 2022 05 01 GPT NeoX 20B An Open Source Autoregressive Language Model Proceedings of BigScience Episode 5 Workshop on Challenges amp Perspectives in Creating Large Language Models Vol Proceedings of BigScience Episode 5 Workshop on Challenges amp Perspectives in Creating Large Language Models pp 95 136 10 dekabrya 2022 Data obrasheniya 19 dekabrya 2022 Hoffmann Jordan Borgeaud Sebastian Mensch Arthur Sifre Laurent An empirical analysis of compute optimal large language model training neopr Deepmind Blog 12 aprelya 2022 Data obrasheniya 16 marta 2023 13 aprelya 2022 goda Hoffmann Jordan Borgeaud Sebastian Mensch Arthur Buchatskaya Elena Cai Trevor Rutherford Eliza Casas Diego de Las Hendricks Lisa Anne Welbl Johannes Clark Aidan Hennigan Tom Noland Eric Millican Katie Driessche George van den Damoc Bogdan Guy Aurelia Osindero Simon Simonyan Karen Elsen Erich Rae Jack W Vinyals Oriol Sifre Laurent 29 March 2022 Training Compute Optimal Large Language Models arXiv 2203 15556 Narang Sharan Chowdhery Aakanksha Pathways Language Model PaLM Scaling to 540 Billion Parameters for Breakthrough Performance angl ai googleblog com 4 aprelya 2022 Data obrasheniya 9 marta 2023 4 aprelya 2022 goda Democratizing access to large scale language models with OPT 175B angl ai facebook com Data obrasheniya 16 marta 2023 12 marta 2023 goda Zhang Susan Roller Stephen Goyal Naman Artetxe Mikel Chen Moya Chen Shuohui Dewan Christopher Diab Mona Li Xian Lin Xi Victoria Mihaylov Todor Ott Myle Shleifer Sam Shuster Kurt Simig Daniel Koura Punit Singh Sridhar Anjali Wang Tianlu Zettlemoyer Luke 21 June 2022 OPT Open Pre trained Transformer Language Models arXiv 2205 01068 Istochnik neopr Data obrasheniya 20 iyulya 2023 20 iyulya 2023 goda GitHub yandex YaLM 100B Pretrained language model with 100B parameters neopr Data obrasheniya 20 iyulya 2023 16 iyunya 2023 goda bigscience bloom Hugging Face neopr huggingface co Data obrasheniya 16 marta 2023 12 aprelya 2023 goda 20B parameter Alexa model sets new marks in few shot learning angl Amazon Science 2 avgusta 2022 Data obrasheniya 16 marta 2023 15 marta 2023 goda Soltan Saleh Ananthakrishnan Shankar FitzGerald Jack et al 3 August 2022 AlexaTM 20B Few Shot Learning Using a Large Scale Multilingual Seq2Seq Model arXiv 2208 01448 AlexaTM 20B is now available in Amazon SageMaker JumpStart AWS Machine Learning Blog neopr aws amazon com 17 noyabrya 2022 Data obrasheniya 13 marta 2023 13 marta 2023 goda Introducing LLaMA A foundational 65 billion parameter large language model neopr Meta AI 24 fevralya 2023 Data obrasheniya 16 marta 2023 3 marta 2023 goda GPT 4 Technical Report neopr OpenAI 2023 Data obrasheniya 14 marta 2023 14 marta 2023 goda Lardinois Frederic Microsoft s new Bing was using GPT 4 all along neopr TechCrunch 14 marta 2023 Data obrasheniya 14 marta 2023 15 marta 2023 goda StableLM Stability AI Language Models 2023 04 20 20 aprelya 2023 goda
Вершина