CopyRight 2012-2014 DS文库版权所有
GPT2注意力模型Attention Is All You Need
(0 次评价)2527 人阅读0 次下载

 AttentionIsAllYouNeed AshishVaswani ∗ GoogleBrain avaswani@google.com NoamShazeer ∗ GoogleBrain noam@google.com NikiParmar ∗ GoogleResearch nikip@google.com JakobUszkoreit ∗ GoogleResearch usz@google.com LlionJones ∗ GoogleResearch llion@google.com AidanN.Gomez ∗† UniversityofToronto aidan@cs.toronto.edu ŁukaszKaiser ∗ GoogleBrain lukaszkaiser@google.com IlliaPolosukhin ∗‡ illia.polosukhin@gmail.com Abstract Thedominantsequencetransductionmodelsarebasedoncomplexrecurrentorconvolutionalneuralnetworksthatincludeanencoderandadecoder.Thebestperformingmodelsalsoconnecttheencoderanddecoderthroughanattentionmechanism.Weproposeanewsimplenetworkarchitecture,theTransformer, basedsolelyonattentionmechanisms,dispensingwithrecurrenceandconvolutions entirely.Experimentsontwomachinetranslationtasksshowthesemodelsto besuperiorinqualitywhilebeingmoreparallelizableandrequiringsignificantly lesstimetotrain.Ourmodelachieves28.4BLEUontheWMT2014English-to-Germantranslationtask,improvingovertheexistingbestresults,including ensembles,byover2BLEU.OntheWMT2014English-to-Frenchtranslationtask,ourmodelestablishesanewsingle-modelstate-of-the-artBLEUscoreof41.8after trainingfor3.5daysoneightGPUs,asmallfractionofthetrainingcostsofthe bestmodelsfromtheliterature.WeshowthattheTransformergeneralizeswellto othertasksbyapplyingitsuccessfullytoEnglishconstituencyparsingbothwith largeandlimitedtrainingdata. 1Introduction Recurrentneuralnetworks,longshort-termmemory[ 13 ]andgatedrecurrent[ 7 ]neuralnetworks inparticular,havebeenfirmlyestablishedasstateoftheartapproachesinsequencemodelingand ∗ Equalcontribution.Listingorderisrandom.JakobproposedreplacingRNNswithself-attentionandstarted theefforttoevaluatethisidea.Ashish,withIllia,designedandimplementedthefirstTransformermodelsand hasbeencruciallyinvolvedineveryaspectofthiswork.Noamproposedscaleddot-productattention,multi-head attentionandtheparameter-freepositionrepresentationandbecametheotherpersoninvolvedinnearlyevery detail.Nikidesigned,implemented,tunedandevaluatedcountlessmodelvariantsinouroriginalcodebaseand tensor2tensor.Llionalsoexperimentedwithnovelmodelvariants,wasresponsibleforourinitialcodebase,and efficientinferenceandvisualizations.LukaszandAidanspentcountlesslongdaysdesigningvariouspartsofandimplementingtensor2tensor,replacingourearliercodebase,greatlyimprovingresultsandmassivelyaccelerating ourresearch. † WorkperformedwhileatGoogleBrain. ‡ WorkperformedwhileatGoogleResearch. 31stConferenceonNeuralInformationProcessingSystems(NIPS2017),LongBeach,CA,USA. arXiv:1706.03762v5 [cs.CL] 6 Dec 2017

打分:

0 星

用户评论:

小飞飞
于 2020-07-15 上传

版权及免责声明|RISC-V单片机中文网 |网站地图

GMT+8, 2022-10-2 13:52 , Processed in 0.527160 second(s), 30 queries .

返回顶部