CopyRight 2012-2014 DS文库版权所有
残差网络Deep Residual Learning for Image Recognition
(0 次评价)1724 人阅读0 次下载

 DeepResidualLearningforImageRecognition KaimingHeXiangyuZhangShaoqingRenJianSunMicrosoftResearch { kahe,v-xiangz,v-shren,jiansun } @microsoft.com Abstract Deeperneuralnetworksaremoredifficulttotrain.Wepresentaresiduallearningframeworktoeasethetrainingofnetworksthataresubstantiallydeeperthanthoseusedpreviously.Weexplicitlyreformulatethelayersaslearn-ingresidualfunctionswithreferencetothelayerinputs,in-steadoflearningunreferencedfunctions.Weprovidecom-prehensiveempiricalevidenceshowingthattheseresidualnetworksareeasiertooptimize,andcangainaccuracyfromconsiderablyincreaseddepth.OntheImageNetdatasetweevaluateresidualnetswithadepthofupto152layers—8 × deeperthanVGGnets[ 41 ]butstillhavinglowercomplex-ity.Anensembleoftheseresidualnetsachieves3.57%errorontheImageNet test set.Thisresultwonthe1stplaceontheILSVRC2015classificationtask.WealsopresentanalysisonCIFAR-10with100and1000layers.Thedepthofrepresentationsisofcentralimportanceformanyvisualrecognitiontasks.Solelyduetoourex-tremelydeeprepresentations,weobtaina28%relativeim-provementontheCOCOobjectdetectiondataset.DeepresidualnetsarefoundationsofoursubmissionstoILSVRC&COCO2015competitions 1 ,wherewealsowonthe1stplacesonthetasksofImageNetdetection,ImageNetlocal-ization,COCOdetection,andCOCOsegmentation. 1.Introduction Deepconvolutionalneuralnetworks[ 22 , 21 ]haveledtoaseriesofbreakthroughsforimageclassification[ 21 , 50 , 40 ].Deepnetworksnaturallyintegratelow/mid/high-levelfeatures[ 50 ]andclassifiersinanend-to-endmulti-layerfashion,andthe“levels”offeaturescanbeenrichedbythenumberofstackedlayers(depth).Recentevidence[ 41 , 44 ]revealsthatnetworkdepthisofcrucialimportance,andtheleadingresults[ 41 , 44 , 13 , 16 ]onthechallengingImageNetdataset[ 36 ]allexploit“verydeep”[ 41 ]models,withadepthofsixteen[ 41 ]tothirty[ 16 ].Manyothernon-trivialvisualrecognitiontasks[ 8 , 12 , 7 , 32 , 27 ]havealso 1 http://image-net.org/challenges/LSVRC/2015/ and http://mscoco.org/dataset/#detections-challenge2015 . 0 1 2 3 4 5 6 0 10 20 iter. (1e4) training error (%) 0 1 2 3 4 5 6 0 10 20 iter. (1e4) test error (%) 56-layer20-layer56-layer20-layer Figure1.Trainingerror(left)andtesterror(right)onCIFAR-10with20-layerand56-layer“plain”networks.Thedeepernetworkhashighertrainingerror,andthustesterror.SimilarphenomenaonImageNetispresentedinFig. 4 . greatlybenefitedfromverydeepmodels.Drivenbythesignificanceofdepth,aquestionarises: Islearningbetternetworksaseasyasstackingmorelayers? Anobstacletoansweringthisquestionwasthenotoriousproblemofvanishing/explodinggradients[ 1 , 9 ],whichhamperconvergencefromthebeginning.Thisproblem,however,hasbeenlargelyaddressedbynormalizedinitial-ization[ 23 , 9 , 37 , 13 ]andintermediatenormalizationlayers[ 16 ],whichenablenetworkswithtensoflayerstostartcon-vergingforstochasticgradientdescent(SGD)withback-propagation[ 22 ].Whendeepernetworksareabletostartconverging,a degradation problemhasbeenexposed:withthenetworkdepthincreasing,accuracygetssaturated(whichmightbeunsurprising)andthendegradesrapidly.Unexpectedly,suchdegradationis notcausedbyoverfitting ,andaddingmorelayerstoasuitablydeepmodelleadsto highertrain-ingerror ,asreportedin[ 11 , 42 ]andthoroughlyverifiedbyourexperiments.Fig. 1 showsatypicalexample.Thedegradation(oftrainingaccuracy)indicatesthatnotallsystemsaresimilarlyeasytooptimize.Letusconsiderashallowerarchitectureanditsdeepercounterpartthataddsmorelayersontoit.Thereexistsasolution byconstruction tothedeepermodel:theaddedlayersare identity mapping,andtheotherlayersarecopiedfromthelearnedshallowermodel.Theexistenceofthisconstructedsolutionindicatesthatadeepermodelshouldproducenohighertrainingerrorthanitsshallowercounterpart.Butexperimentsshowthatourcurrentsolversonhandareunabletofindsolutionsthat 1 arXiv:1512.03385v1 [cs.CV] 10 Dec 2015

打分:

0 星

用户评论:

小飞飞
于 2020-07-15 上传

版权及免责声明|RISC-V单片机中文网 |网站地图

GMT+8, 2022-10-2 14:12 , Processed in 0.566474 second(s), 30 queries .

返回顶部