LiveFaceDe-IdentiﬁcationinVideo,在哪里能看到cba的直播视频

LiveFaceDe-IdentiﬁcationinVideo OranGafni,LiorWolfFacebookAIResearchandTel-AvivUniversity {oran,wolf}@ YanivTaigmanFacebookAIResearch yaniv@ Abstract Weproposeamethodforfacede-identiﬁcationthatenablesfullyautomaticvideomodiﬁcationathighframerates.Thegoalistomaximallydecorrelatetheidentity,whilehavingtheperception(pose,illuminationandexpression)ﬁxed.Weachievethisbyanovelfeed-forwardworkarchitecturethatisconditionedonthehigh-levelrepresentationofaperson’sfacialimage.workisglobal,inthesensethatitdoesnotneedtoberetrainedforagivenvideoorforagivenidentity,anditcreatesnaturallookingimagesequenceswithlittledistortionintime.
1.Introduction Inconsumerimageandvideoapplications,thefacehasauniqueimportancethatstandsoutfromallotherobjects.Forexample,facerecognition(detectionfollowedbyidentiﬁcation)isperhapsmuchmorewidelyapplicablethananyotherobjectrecognition(categorization,detection,orinstanceidentiﬁcation)inconsumerimages.Similarly,puttingasideimageprocessingoperatorsthatareappliedtotheentireframe,faceﬁltersremainthemostpopularﬁltersforconsumervideo.Sincefacetechnologyisbothusefulandimpactful,italsoraisesmanyethicalconcerns.Facerecognitioncanleadtolossofprivacyandfacereplacementtechnologymaybemisusedtocreatemisleadingvideos. Inthiswork,wefocusonvideode-identiﬁcation,whichisavideoﬁlteringapplicationthatbothrequiresatechnologicalleapoverthecurrentstate-of-the-art,andisbenigninnature.Thisapplicationrequiresthecreationofavideoofasimilarlookingperson,suchthattheperceivedidentityischanged.Thisallows,forexample,theusertoleaveanatural-lookingvideomessageinapublicforuminananonymousway,thatwouldpresumablypreventfacerecognitiontechnologyfromrecognizingthem. Videode-identiﬁcationisachallengingtask.Thevideoneedstobemodiﬁedinaseamlessway,withoutcausingﬂickeringorothervisualartifactsanddistortions,suchthattheidentityischanged,whileallotherfactorsremainidentical,seeFig.1.Thesefactorsincludepose,expression,lip Figure1.De-identiﬁcationvideoresultsdemonstratedonavarietyofposes,expressions,illuminationconditionsandlusions.Pairsofthesourceframe(ﬁrstrow)andtheoutputframe(secondrow)areshown.Thehigh-levelfeatures(e.g.nose,eyes,eyebrowsandmouth)arealtered,whilethepose,expression,liparticulation,illumination,andskintonearepreserved. positioning(forunalteredspeech),lusion,illuminationandshadow,andtheirdynamics. Incontrasttotheliteraturemethods,whicharelimitedtostillimagesandoftenswapagivenfacewithadatasetface,ourmethodhandlesvideoandgeneratesdenovofaces.Ourexperimentsshowconvincingperformanceforunconstrainedvideos,producingnaturallookingvideos.Thepersonintherenderedvideohasasimilarappearancetothepersonintheoriginalvideo.However,astate-of-the-artworkfailstoidentifytheperson.Asimilarexperimentshowsthathumanscannotidentifythegeneratedface,evenwithouttimeconstraints. Ourresultswouldnothavebeenpossible,withoutahostofnovelties.Weintroduceanovelencoder-decoderarchitecture,inwhichweconcatenatetothelatentspacetheactivationsoftherepresentationlayerofworktrainedtoperformfacerecognition.Asfarasweknow,thisistheﬁrsttimethatarepresentationfromanexistingclassiﬁworkisusedtoaugmentanautoencoder,whichenablesthefeed-forwardtreatmentofnewpersons,unseenduringtraining.Inaddition,thisistheﬁrstworktointroduceanew 19378 PreservesexpressionPreservesposeGeneratesnewfacesDemonstratedonvideoDemonstratedonadiversedataset(gender,ethnicity,age,etc.)Referencetoparisonwithours Newton,’05[32] - Gross,’08[10] + † + Samarzija,’14[41] +- Fig.7 Jourabloo,’15[16] - † + Meden,’17[31] ++- Fig.4 Wu,Sun’18Our’18[49][43,44] - - + - + + + + + - - + - - + Fig.8Fig.5,14 Table1.parisontotheliteraturemethods.Theﬁnalrowparisonﬁguresinthiswork.paretoallmethodsthatprovidereasonablequalityimagesintheirmanuscript,underconditionsthatarefavorabletopreviouswork(wecroptheinputimagesfromthepdfﬁles,exceptfortheimagesreceivedfromtheauthorsof[43,44]).†Thefaceisswappedwithanaverageofafewdatasetfaces. typeofattractor-repellerperceptuallossterm.Thistermdistinguishesbetweenlow-andmid-levelperceptualterms,andhigh-levelones.Theformerareusedtotietheoutputframetotheinputvideoframe,whilethelatterisusedtodistancetheidentity.Inthisnovelarchitecture,theinjectionoftherepresentationtothelatentspaceenablesworktocreateanoutputthatadherestoplexcriterion.Anotheruniquefeatureisthatworkoutputsbothanimageandamask,whichareused,intandem,toreconstructtheoutputframe.Themethodistrainedwithaspeciﬁcdataaugmentationtechniquethatencouragesthemappingtobesemantic.Additionaltermsincludereconstructionlosses,edgelosses,andanadversarialloss.
2.PreviousWork Faceshavebeenmodeledputergraphicssystemsforalongtime.Inmachinelearning,faceshavebeenoneofthekeybenchmarksforGAN-basedgenerativemodels[9,37,40]sincetheirinception.HighresolutionnaturallookingfaceswererecentlygeneratedbytrainingboththegeneratorandthediscriminatoroftheGANprogressively,startingwithworksandlowerresolutions,andenlargingthemgradually[17]. Conditionalgenerationoffaceshasbeenakeytaskinvariousunsuperviseddomaintranslationcontributions,wherethetaskistolearntomap,e.g.,apersonwithouteyeweartoapersonwitheyeglasses,withoutseeingmatchingsamplesfromthetwodomains[20,51,1,27].Formoredistantdomainmapping,suchasmappingbetweenafaceimageandtheputergraphicsavatar,additionalsupervisionintheformofafaceworkwasused[45].Ourworkusesthesefacedescriptors,inordertodistancetheidentityoftheoutputfromthatoftheinput. Asfarasweknow,ourworkistheﬁrstde-identiﬁcationworktopresentresultsonvideos.Instillimages,severalmethodshavebeenpreviouslysuggested.Earlierworkimplementeddifferenttypesofimagedistortionsforfacedeidentiﬁcation[33,10],whilemorerecentworksrelyontechniquesforselectingdistantfaces[41]oraveraging/fusing facesfrompre-existingdatasets[32,16,31].Theexperimentsconductedbytheaforementionedtechniquesarerestricted,inmostcases,tolow-resolution,blackandwhiteresults.Althoughitispossibletocreateeye-pleasingresults,theyarenotrobusttodifferentposes,illuminationsandfacialstructures,makingtheminadequateforvideogeneration.TheuseofGANsforfacede-identiﬁcationhasbeensuggested[49].However,theexperimentswererestrictedtoahomogeneousdataset,withnoapparentexpressionpreservationwithintheresults.IntheGAN-basedmethodsof[43,44],facede-identiﬁcationisemployedfortherelatedtaskofpersonobfuscation.Theworkof[43]conditionstheoutputimagebasedonbothablurredversionoftheinputandtheextractedfacialposeinformation.Thefollow-upwork[44]binestheGAN-basedreconstructionwithaparametricfacework.Asbothmethodsareappliedoverfullupper-bodyimages,theyresultinlowfacialresolutionoutputsof64×64.Thesemethodsdonotpreserveexpressions,areunsuitableforvideo,andasionallyprovideunnaturaloutputs. Tab.1providesparativeviewoftheliterature.Thecurrentliteratureonde-identiﬁcationofteninvolvesfaceswapping(ourmethoddoesnot).Faceswapping,i.e.,thereplacementofaperson’sfaceinanimagewithanotherperson’sface,hasbeenanactiveicforsometime,startingwiththeinﬂuentialworkof[3,2].Recentcontributionshaveshownagreatdealofrobustnesstothesourceimage,aswellasforthepropertiesoftheimage,fromwhichthetargetfaceistaken[19,34].Whiletheseclassicalfaceswappingmethodsworkinthepixelspaceandcopytheexpressionofthetargetimage,recentdeep-learningbasedworkswapstheidentity,whilemaintainingtheotheraspectsofthesourceimage[23].parisontoourwork,[23]requirestrainingaworkforeverytargetperson,thetransferredexpressiondoesnotshowsubtleties(whichwouldbecritical,e.g.,foraspeakingperson),andtheresultsarenotasnaturalasours.Theselimitationsareprobablyaresultofcapturingtheappearanceofthetarget,byrestrictingtheoutputtobesimilar,patchbypatch,toacollectionof 9379 (a) (b) Figure2.(a)Thearchitectureofwork.Forconditioning,apre-trainedfaceworkisused.(b)Anillustrationofthe multi-imageperceptuallossused,whichemploystworeplicasofthesamefacework. patchesfromthetargetperson.Moreover,[23]islimitedtostillsandwasnotdemonstratedonvideo. Thefaceswapping(FS)project[8]isanunpublishedworkthatreplacesfacesinvideoinawaythatcanbeveryconvincing,givensuitableinputs.Unlikework,theFSisretrainedforeverypairofsource-videoandideopersons.TheinputstotheFSsystem,duringtraining,aretwolargesetsofimages,onefromeachidentity.Inordertoobtaingoodresults,thousandsofimagesfromeachindividualwithasigniﬁcantvariabilityinpose,expression,andilluminationaretypicallyused.Inmanycases,alargesubsetoftheimagesofthesourcepersonaretakenfromthevideothatisgoingtobeconverted.Inaddition,FSoftenfails,andinordertoobtainaconvincingoutput,thepersoninthesourcevideoandthetargetpersonneedtohaveasimilarfacialstructure.Theselimitationsmakeitunsuitableforde-identiﬁcationpurposes. Likeours,theFSmethodisbasedonanencoder-decoderarchitecture,wherebothanimageandoutputmaskareproduced.AfewtechnicalnoveltiesofFSaresharedwithourwork.Mostnotableisthewayinwhichaugmentationisperformedinordertotrainamoresemanticwork.DuringthetrainingofFS,theinputimageismodiﬁedbyrotatingorscalingit,beforeitisfedtotheencoder.Theimagethatthedecoderoutputsparedtotheundistortedimage.monpropertyisthattheGANvariantusedemploysvirtualexamplescreatedusingthemixuptechnique[52].Inaddition,inordertomaintaintheposeandexpression,whichareconsideredlow-ormid-levelfeaturesinfacedescriptors(orthogonaltotheidentity)FSemploysaperceptualloss[15,47]thatisbasedonthelayersofawork. Anotherlineofworkthatmanipulatesfacesinvideoisfacereanimation,e.g.,[46].Thislineofworkreanimatesthefaceinthetargetvideo,ascontrolledbythefaceinasource video.Thisdoesnotprovideade-identiﬁcationsolutioninthesensethatwediscuss–theoutputvideoisreanimatedinadifferentscene,andnotinthesceneofthesourcevideo.Inaddition,italwaysprovidesthesameoutputidentity. Wedonotenforcedisentanglement[14,26,5]betweenthelatentrepresentationvectorZandtheidentity,sinceworkreceivesthefullinformationregardingtheidentityusingthefacedescriptor.Therefore,washingouttheidentityinformationinZmaynotbebeneﬁcial.Similarly,theU-NetconnectionmeansthatidentityinformationcanbypassZ.Inourmethod,theremovalofidentityisnotdonethroughdisentanglementbutviatheperceptualloss.AsFig.9demonstrates,thislossprovidesadirectandquantiﬁablemeansforcontrollingtheamountofidentityinformation.Withdisentanglement,thiseffectwouldbebrittleandsensitivetohyperparameters,asisevidentinworkwheretheencodingissettobeorthogonal,eventosimplemulticlasslabelinformation,e.g,[25].
3.Method Ourarchitectureisbasedonanadversarialautoencoder[29],coupledwithatrainedface-classiﬁer.Byconcatenatingtheautoencoder’slatentspacewiththeface-classiﬁerrepresentationlayer,weachievearichlatentspace,embeddingbothidentityandexpressioninformation.workistrainedinacounter-factualway,i.e.,theoutputdiffersfromtheinputinkeyaspects,asdictatedbytheconditioning.Thegenerationtaskis,therefore,highlysemantic,andthelossrequiredtocaptureitsesscannotbeaconventionalreconstructionloss. Forthetaskofde-identiﬁcation,weemployatargetimage,whichisanyimageofthepersoninthevideo.Themethodthendistancesthefacedescriptorsoftheoutputvideofromthoseofthetargetimage.Thetargetimagedoesnotneedtobebasedonaframefromtheinputvideo.This 9380 contributestotheapplicabilityofthemethod,allowingittobeappliedtolivevideos.Inourexperiments,wedonotuseaninputframeinordertoshowthegeneralityoftheapproach.Toencodethetargetimage,weuseapre-trainedfaceclassiﬁerwork[12],trainedovertheVGGFace2dataset[4]. Theprocessduringtesttimeissimilartothestepstakeninthefaceswappingliteratureandinvolvesthefollowingsteps:(a)Asquareboundingboxisextractedusingthe’dlib’[21]facedetector.(b)68facialpointsaredetectedusing[18].(c)Atransformationmatrixisextracted,usinganestimatedsimilaritytransformation(scale,rotationandtranslation)toanaveragedface.(d)Theestimatedtransformationisappliedtotheinputface.(e)Thetransformedfaceispassedtowork,togetherwiththerepresentationofthetargetimage,obtainingbothanoutputimageandamask.(f)Theoutputimageandmaskareprojectedback,usingtheinverseofthesimilaritytransformation.(g)Wegenerateanoutputframebylinearlymixing,perpixel,theinputandwork’stransformedoutputimage,ordingtotheweightsofthetransformedmask.(h)Theeismergedintotheoriginalframe,intheregiondeﬁnedbytheconvexhullofthefacialpoints. Attrainingtime,weperformthefollowingsteps:(a)Thefaceimageisdistortedandaugmented.Thisisdonebyapplyingrandomscaling,rotationandelasticdeformation.(b)Thedistortedimageisfedintowork,togetherwiththerepresentationofatargetimage.Duringtraining,weselectthesameimage,undistorted.(c)Abinationofthemaskedoutputputedasinstep(g)above)andtheundistortedinputisfedtothediscriminator.Thisisthemixuptechnique[52]discussedbelow.(d)Lossesareappliedonwork’smaskandimageoutput,aswellastothemaskedoutput,asdetailedbelow. Notethatthereisadiscrepancybetweenhowworkistrainedandhowitisapplied.Notonlydowenotmakeanyexplicitefforttotrainonvideos,thetargetimagesareselectedinadifferentway.Duringtraining,weextracttheidentityfromthetrainingimageitselfandnotfromanindependenttargetimage.Themethodisstillabletogeneralizetoperformtherealtaskonunconstrainedvideos. 3.1.Networkarchitecture ThearchitectureisillustratedinFig.2(a).Theencoderposedofaconvolutionallayer,followedbyﬁvestrided,depth-wiseseparable[6]convolutionswithinstancenormalization[48].Subsequently,asinglefullyconnectedlayerisemployed,andthetargetfacerepresentationisconcatenated.Thedecoderposedofafullyconnectedlayer,followedbyalatticeofupscaleandresidual[12]blocks,terminatedwithatanhactivatedconvolutionfortheoutputimage,andasigmoidactivatedconvolutionforthemaskoutput.Eachupscaleblockprisedofa2Dconvolu- tion,withtwicethenumberofﬁltersastheinputchannelsize.FollowinganinstancenormalizationandaLReLU[11]activation,theactivationsarere-ordered,sothatthewidthandheightaredoubled,whilethechannelsizeishalved.EachresidualblockinputissummedwiththeoutputofaConv2D-LReLU-Conv2Dchain. Alow-capacityconnection[38]isemployed(32x32x1),thusrelievingtheautoencoder’sbottleneck,allowingastrongerfocusontheencodingoftransfer-relatedinformation.Theconnectionsizedoesnotexceedthebottlenecksize(1024)andduetothedistortionoftheinputimage,acollapseintoasimplereconstructingautoencoderinearlytrainingstagesisaverted. ThediscriminatorconsistsoffourstridedconvolutionswithLReLUactivations,withinstancenormalizationappliedonallbuttheﬁrstone.Asigmoidactivatedconvolutionyieldsasingleoutput. workhastwoversions:alowerresolutionversiongenerating128x128images,andahigherresolutionversion,generating256x256images.Thehigherresolutiondecoderissimpliﬁedandenlargedandconsistsofalatticeof6x(Upscaleblock–>Residualblock).Unlessotherwisespeciﬁed,theresultspresentedintheexperimentsaredonewiththehigh-resmodel. 3.2.TrainingandtheLossesUsed Fortrainingworks,exceptforthediscriminatorD,weusepoundlossL,whichisaweightedsumofmultipleparts: L=α0LG+α1LrRaw+α1LmRasked+α2Lrxaw+α2Lryaw+α2Lmxasked+α2Lmyasked+α3Lrpaw+α3Lmpasked+α4Lm+α5Lmx+α5Lmy, whereLGisthegenerator’sloss,LrRawandLmRaskedarereconstructionlossesfortheoutputimageofthedecoderzrawandtheversionafterapplyingthemaskingzmasked,L∗xandL∗yarereconstructionlossesappliedtothespatialimagesderivatives,L∗paretheperceptuallosses,andLm∗areregularizationlossesonthemask.TheworkistrainedusingitsownlossLD.Throughoutourexperiments,weemployα0=α1=α2=α3=0.5,α4=3·10−
3,α5=10−
2. Tomaintainrealisticlookinggeneratoroutputs,anadversariallossisusedwithabinationofexamplepairs(knownasmixup)[52]overaLeastSquareGAN[30]: LD= D(δmx)−λβ
1 22 LG=α
0 D(δmx)−(1−λβ)
1 22 While,δmx=λβ·x+(1−λβ)zmaskedandλβissampledoutofaBetadistributionλβ∼Beta(α,α),xistheundistortedinput“real”sampleandzmaskedisthepostmasking 9381 generatedsample.Avalueofα=0.2isusedthroughouttheexperiments. Additionallossesareexercisedtobothretainsourceto-outputsimilarity,yetdriveaperceptibletransformation.Severallossesaredistributedequallybetweentherawandmaskedoutputs,imposingconstraintsonboth.AnL1reconstructionlossisusedtoenforcepixel-levelsimilarity: LrRaw=α1zraw−x1LmRasked=α1zmasked−x1 wherezrawistheoutputimageitself.Thisresultsinanontrivialconstraint,astheencoderinputimageisdistorted.Anedge-preservinglossisusedtoconstrainpixel-levelderivativedifferencesinboththexandyimageaxes.Calculatedastheabsolutedifferencebetweenthesourceandoutputderivativesineachaxisdirectionforboththerawandmaskedoutputs: Lrxaw=α2zxraw−xx1Lryaw=α2zyraw−xy1 Lmxasked=α2zxmasked−xx1Lmyasked=α2zymasked−xy1 wherexxisthederivativeoftheundistortedinputimagexalongthexaxis,andsimilarlyforoutputszandtheyaxis. Additionallossesareappliedtotheblendingmaskm, where0indicatesthatthevalueofthispixelwouldbetakenfromtheinputimagex,1indicatestakingthevaluefromzraw,andintermediatevaluesindicatelinearmixing.We wouldlikethemasktobebothminimalandsmoothand, therefore,employthefollowinglosses: Lm=m1Lmx=mx1Lmy=my1 wheremxandmyarethespatialderivativesofthemask. 3.2.1AMulti-ImagePerceptualLoss Anewvariantoftheperceptualloss[15]isemployedtomaintainsourceexpression,poseandlightingconditions,whilecapturingthetargetidentityessence.Thisisachievedbyemployingaperceptuallossbetweentheundistortedsourceandgeneratedoutputonseverallow-to-mediumabstractionlayers,whiledistancingthehighabstractionlayerperceptuallossbetweenthetargetandgeneratedoutput. Letarn×nbetheactivationsofann×nspatialblockwithinthefaceclassiﬁworkforimager,whereinourcase,rcanbeeithertheinputimagex,thetargetimaget,therawoutputzraw,orthemaskedoutputzmasked. Weconsiderthespatialactivationsmapsofsize112×112,56×56,28×28and7×7,aswellastherepresentationlayerofsize1×
1.Thelowerlayers(largermaps)areusedtoenforcesimilaritytotheinputimagex,whilethe7×7layerisusedtoenforcesimilaritytot,andthe1×1featurevectorisusedtoenforcedissimilaritytothetargetimage. Letusdeﬁneℓrn1×,rn2=ar1,n×n−ar2,n×n1,isanormalizingconstant,correspondingtothesizeofthespatialactivationmap. Figure3.Sampleresultsforvideode-identiﬁcation(zoom).Tripletsofsourceframe,convertedframeandtargetareshown.Themodiﬁedframelookssimilarbuttheidentitypletelydifferent. Theperceptuallossisgivenby: Lcp=ℓx11,z2c×112+ℓx56,z×c56+ℓx28,z×c28+ℓt7,×zc7−λℓt1,×z1 forcthatiseitherrawormasked,andwhereλ>0isahyperparameter,whichdeterminesthegeneratedface’shighlevelfeaturesdistancefromthoseofthetargetimage. Theapplicationofthemulti-imageperceptuallossduringtrainingisdepictedinFig.2(b).Duringtraining,thetargetisthesource,andthereisonlyoneinputimage.Theresultingimagehasthetexture,poseandexpressionofthesource,butthefaceismodiﬁedtodistancetheidentity.Notethatwerefertoitasamulti-imageperceptualloss,asitsaimistominimizetheanalogerrortermduringinference(generalizationerror).However,asatrainingloss,itisonlyappliedduringtrain,whereitreceivesapairofimages,similartootherperceptuallosses. Notethattheperceptuallossarenormalizingconstantsobtainedbycountingthenumberofelements.Inaddition,α0=α1=α2=α3aresimplysettoone,andα
4,α5werechosenarbitrarily.Therefore,thereiseffectively,onlyasingleimportanthyperparameter:λ,whichprovidesadirectcontrolofthestrengthoftheidentitydistancewhichrequirestuning(seeFig.9). Atinferencetime,workisfedaninputframeandatargetimage.Thetargetimageistransmittedthroughthefaceclassiﬁer,resultinginatargetfeaturevector,which,inturn,isconcatenatedtothelatentembeddingspace.Duetothewayworkistrained,thedecoderwilldrivetheoutputimageawayfromthetargetfeaturevector. 9382 Videolowerhigher
1 28.7%34.2%
2 66.7%45.8%
3 61.9%64.3%
4 52.4%62.1%
5 42.9%43.8%
6 47.6%27.0%
7 57.1%56.8%
8 71.4%73.5% (a) (b) (c) (d) Table2.(a)Videouserstudy-essrateinuseridentiﬁcationofarealvideofromamodiﬁedoneforbothlower-resolutionand higher-resolutionmodels.Closerto50%isbetter.(b)Eachcolumnisadifferentindividualfromthestillimageuserstudy.[Row1]The galleryimages,i.e,thealbumimagestheuserswereaskedtoselecttheidentityfrom.[Row2]Theinputimages.[Row3]Thede-identiﬁed versionof[Row2].(c)Theconfusionmatrixinidentifyingtheﬁvepersonsfortherealimages(control).(d)Theconfusionmatrixfor identifying,basedonthede-identiﬁedimages. PersoninMethod RGBvalues Facedesc. Row1 [41] 5.461.21 Ourhigh2.721.50 Row2 [41] 4.911.35 Ourhigh2.351.53 Row3 [41] 4.511.20 Ourhigh3.921.32 Table3.Thedistancebetweentheoriginalandde-identiﬁedimage,fortheimagesinFig.7.Ourmethodresultsinlowerpixeldifferencesbutwithfacedescriptordistancesthatarehigher.
4.Experiments TrainingisperformedusingtheAdam[22]optimizer,withthelearningratesetto10−
4,β1=0.5,andβ2=0.99.Ateachtrainingiteration,abatchof32imagesforthelowerresolutionmodel,and64forthehigherresolutionmodel,arerandomlyselectedandaugmented.Weinitializeallconvolutionalweightsusingarandomnormaldistribution,withameanof0andastandarddeviationof0.02.Biasweightsarenotused.ThedecoderincludesLReLUactivationswithα=0.2forresidualblocksandα=0.1otherwise.TheworkwastrainedonaunionofLFW[13],CelebA[28]andPubFig[24],totaling260,000images,thevastmajorityfromCelebA.Theidentityinformationisnotusedduringtraining.TheworkwastrainonaunionofCelebA-HQ[17],andfacesextractedoutofthe1,000sourcevideosusedby[39],resultingin500,000images.Trainingwasmoreinvolvedforthelowerresolutionmodel,anditwastrainedfor230kiterationswithagradualincreasingstrengthofthehyperparameterλ,rangingfromλ=1·10−7toλ=2·10−6,infoursteps.Withoutthisgradualincrease,thenaturalnessofthegeneratedfaceis diminished.Forthehigherresolutionmodel,80kiterationswithaﬁxedλ=2·10−6weresufﬁcient. SampleresultsareshowninFig.3.Ineachcolumn,weshowtheoriginalframe,themodiﬁed(output)frame,andthetargetimagefromwhichtheidentitywasextracted.Ascanbeseen,ourmethodproducesnaturallookingimagesthatmatchtheinputframe.Identityisindeedmodiﬁed,whiletheotheraspectsoftheframearemaintained. Thesupplementarymediacontainssamplevideos,withsigniﬁcantmotion,pose,expressionandilluminationchanges,towhichourmethodwasapplied.Itisevidentthatthemethodcandealwithvideos,withoutcausingmotionorinstability-baseddistortions.Thisisdespitebeingstrictlybasedonper-frameanalysis. Itisalsoevidentthatthelowerresolutionmodelseemsblurryattimes.Thisisaconsequenceoftheﬁxedresolutionandnotofthegeneratedimage,whichisinfactsharp.Thehigherresolutionmodelclearlyprovidesmorepleasingresults,whentherequiredresolutionishigh. Totestthenaturalnessoftheapproach,wetestedtheabilityofhumanstodiscriminatebetweenvideosthatweremodiﬁedtothosethatwerenot.Althoughthehumanobservers(n=20)werefullyawareofthetypeofmanipulationthatthevideoshadundergone,thehumanperformancewasclosetorandom,withanaverageessrateof53.6%(SD=13.0%),seeTab.2(a).Inordertoavoidadecisionbasedonafamiliarface,thiswasevaluatedonanon-celebritydatasetcreatedspeciﬁcallyforthispurpose,whichcontained8videos. Familiaridentities,canoftenberecognizedbynon-facialcues.Toestablishthatgivenasimilarcontextaroundafacialidentity(e.g.hair,gender,ethnicity),theperceivedidentityisshiftedinawaythatisalmostimpossibletoplace,weconsideredimagesofﬁvepersonsofthesameethnicityandsimilarhairstylesfromaTVshow,andcollectedtwosetsofimages:reference(gallery)andsource.Thesource 9383 Person SimoneBilesBillyanSelenaGomezScarlettJohanssonStevenYeunSarahJ.Parker Average Originalframes Median Mean±SD
1 3±50 195.6±313
1 1±
0 1 3.8±38.6
1 1.02±0.6
1 1±
0 1 17 Lower-resde-IDmodel Median Mean±SD 17303156225690125806679 2400.6±21423456.3±26012704±18737753.5±31124976.2±31671069.3±1096 3773 3726 Higher-resde-ID Median Mean±SD 1725901805844931069408 2223±18141334±15188110±21864830±25441814±2544620±665 2776 3155 Table4.Rankingofthetrueidentityoutofadatasetof54,000persons(SD=StandardDeviation).Evaluationisperformedonthepre-trainedLResNet50E-IRwork.Resultsaregivenforboththelower-andhigher-resolutionmodels. imagesweremodiﬁedbyourmethod,usingthemastargetsaswell,seeTab.2(b).AscanbeseenintheconfusionmatrixofTab.2(c),theuserscouldeasilyidentifythecorrectgalleryimages,basedonthesourceimages.However,asTab.2(d)indicates,postde-identiﬁcation,theanswershadlittlecorrelationwiththetrueidentity,asdesired. Inordertoautomaticallyquantifytheperformanceofourde-identiﬁcationmethod,weappliedastate-of-the-artwork,namely,theArcFace[7]work.workwasselectedbothforitsperformance,andforthedissimilaritybetweenworkandthework,usedaspartofwork,inboththetrainingsetandloss. TheresultsoftheautomaticidentiﬁcationarepresentedinTab.4forboththelowerresolutionandthehigherresolutionmodels.Identiﬁcationisperformedoutofthe54,000personsintheArcFaceveriﬁcationset.Thetablereportstherankofthetruepersonoutofallpersons,whensortingthesoftmaxprobabilitiesthatthefaceworkproduces.Therankingofthetrueidentityintheoriginalvideoshowsanexcellentrecognitioncapability,withmostoftheframesidentifyingthecorrectpersonas-1result.Forthede-identiﬁedframes,despitethelargesimilaritybetweentheoriginalandthemodiﬁedframes(Fig.3),therankistypicallyinthethousands. AnotherautomaticfacerecognitionexperimentisconductedontheLFWbenchmark[13].Tab.5presentstheresultsonde-identiﬁedLFWimagepairsforagivenperson(de-identiﬁcationwasappliedtothesecondimageofeachpair),fortwoFaceNet[42]models.ThetruepositiveratefortheLFWbenchmarkdropsfromalmost0.99,tolessthan0.04afterapplyingde-identiﬁcation. Anadditionalexperiment,evaluatingourmethodontheLFWbenchmark,canbefoundintheappendix. parisonofourmethodwiththerecentworkof[31]isgiveninFig.4.Thismethodreliesonthegenerationofanewidentity,giventhek-closestidentities,asselectedby FaceNetModelOriginal De-ID VGGFace2CASIA 0.986±0.0100.038±0.0150.965±0.0160.035±0.011 Table5.ResultsontheLFWbenchmark,employingtheFaceNet networktrainedonVGGFace2orCASIA-WebFace.Shownisthe TruePositiveRateforaFalseeptanceRateof0.001. atrainedCNNfeature-extractor.Ascanbeseen,thiscanresultinthesamerenderedidentitiesformultipleinputs,anddoesnotmaintaintheexpression,illuminationandskintone. Toemphasizetheabilityofidentity-distancing,whilemaintainingpixel-spacesimilarity,pareourmethodto[41].Whilethemethodof[41]reliesonﬁndingadissimilaridentitywithinagivendataset,oursissingle-imagedependent,inthesensethatitdoesnotrelyonotherimageswithinadataset.Itis,therefore,resilienttodifferentposes,expressions,lightingconditionsandfacestructures.Giventheﬁguresprovidedintheworkof[41],pareourgeneratedoutputsbyhigh-levelperceptualdistancefromthesourceface,takingintoountpixel-levelsimilarity(Fig.7).parisonofthedistancebetweentheoriginalandthede-identiﬁedimageforthetwomethods(Tab.3)revealsthatourmethodresultsinlowerpixeldifferences,yetwithfacedescriptordistancesthatarehigher. parisonwiththeworkof[49]isgiveninFig.8.Ourresultsareatleastasgoodastheoriginalones,despitehavingtorunonthecroppedfacesextractedfromthepaperPDF.Although[49]presentsvisuallypleasingresults,theydonotmaintainlow-levelandmedium-levelfeatures,includingmouthexpressionandfacialhair.Inaddition,theworkof[49]presentsresultsonlow-resolutionblackandwhiteimagesonly,withnoposeorgendervariation. Figurepareswiththerecentworkof[43,44].Ourmethodisabletodistancetheidentityinamoresubtleway,whileintroducinglessartifact.Ourgeneratedimagecontainsonlytheface,whichisenabledbytheuseofthemask.Their 9384 (a) (b) (c) Figure4.(a)Inputimages from[31],(b)ourresults,(c)those of[31].Ourmethodmaintainsthe expression,pose,andillumination. Furthermore,ourworkdoesnot assignthesamenewidentityto differentpersons. (a) (b) (c) Figure5.(a)Inputimages from[43,44],(b)ourresults, (c)thoseof[43](row1)and [44](rows2-3). Figure6.De-IdentiﬁcationappliedtotheexampleslabeledasverychallengingintheNISTFaceRecognitionChallenge[36]. methodgeneratesboththefaceandtheupperbodyusingthesame256×256generationresolution,whichmakesourresultsofamuchhighereffectiveresolution.Afullsetofresultsisgivenintheappendix,Fig.14. Tofurtherdemonstratetherobustnessofourmethod,weappliedourtechniquetoimagescopieddirectlyfromtheverydifﬁcultinputsof[36].AscanbeseeninFig.6,ourmethodisrobusttoverychallengingilluminations. Todemonstratethecontrolofthehyperparameterλovertheidentitydistance,weprovideasequenceofgeneratedimages,whereeachtrainedmodelisidentical,apartfromthestrengthofλ.TheincrementalshiftinidentitycanbeseeninFig.9.Ablationanalysesaregivenintheappendix.Theparevariousvariantsofourmethod,anddepicttheartifactsintroducedbyremovingpartsofit.
5.Conclusions Recentworldeventsconcerningtheadvancesin,andabuseoffacerecognitiontechnologyinvoketheneedtounderstandmethodsthatessfullydealwithdeidentiﬁcation.Ourcontributionistheonlyonesuitableforvideo,includinglivevideo,andpresentsqualitythatfarsurpassestheliteraturemethods.Theapproachisbothelegantandmarkedlynovel,employinganexistingfacedescriptorconcatenatedtotheembeddingspace,alearnedmaskforblending,anewtypeofperceptuallossforgettingthedesiredeffect,amongafewothercontributions. Minimallychangingtheimageisimportantforthemethodtobevideo-capable,andisalsoanimportantfactorinthecreationofadversarialexamples[35].Unlikeadversarialexamples,inourwork,thischangeismeasuredusinglow-andmid-levelfeaturesandnotusingnormsonthepixelsthemselves.Itwasrecentlyshownthatimageperturbationscausedbyadversarialexamplesdistortmid-levelfeatures[50],whichweconstraintoremainunchanged. (a) (b) (c) (d) Figure7.Comparisonwith[41](fromthepapersampleimage).(a) Originalimage(alsousedforthetargetofourmethod).(b)Our generatedoutput.(c)Resultof[41].(d)Targetusedby[41]. Figure8.Comparisonwith[49].Row1-Originalimages.Row2resultsof[49].Row3-Ourgeneratedoutputs.Thepreviousworkdoesnotmaintainmouthexpressionorfacialhair. (a) (b) (c) (d) Figure9.Incrementallygrowingλinthelowerresolutionmodel. Agradualidentityshiftcanbeobserved.(a)Source.(b)λ= −5·10−
7.(c)λ=−1·10−
6.(d)λ=−2·10−
6. 9385 References [1]SagieBenaimandLiorWolf.One-sidedunsuperviseddomain mapping.InNIPS,2017.2 [2]DmitriBitouk,NeerajKumar,SamreenDhillon,PeterBel- humeur,andShreeK.Nayar.Faceswapping:Automatically replacingfacesinphotographs.InSIGGRAPH,2008.2 [3]VolkerBlanz,KristinaScherbaum,ThomasVetter,andHans- PeterSeidel.Exchangingfacesinimages.InComputer GraphicsForum,volume23,pages669–676.WileyOnline Library,2004.2 [4]QiongCao,LiShen,WeidiXie,OmkarMParkhi,andAn- drewZisserman.Vggface2:Adatasetforrecognisingfaces acrossposeandage.arXivpreprintarXiv:1710.08092,2017.
4 [5]XiChen,XiChen,YanDuan,ReinHouthooft,JohnSchul- man,IlyaSutskever,andPieterAbbeel.InfoGAN:Inter- pretablerepresentationlearningbyinformationmaximizing generatives.InNIPS.2016.3 [6]FrancoisChollet.Xception:Deeplearningwithdepthwise separableconvolutions.InProceedingsoftheIEEECon- ferenceonComputerVisionandPatternRecognition,pages 1251–1258,2017.4 [7]JiankangDeng,JiaGuo,andStefanosZafeiriou.Arcface: Additiveangularmarginlossfordeepfacerecognition.arXiv preprintarXiv:1801.07698,2018.7 [8]Faceswap. Github project, /deepfakes/faceswap.2017.3 [9]IanGoodfellow,JeanPouget-Abadie,MehdiMirza,Bing Xu,DavidWarde-Farley,SherjilOzair,AaronCourville,and YoshuaBengio.Generatives.InNIPS.2014.
2 [10]RalphGross,LatanyaSweeney,FernandoDeLaTorre,and SimonBaker.Semi-supervisedlearningofmulti-factormod- elsforfacede-identiﬁcation.In2008IEEEConferenceon ComputerVisionandPatternRecognition,2008.2 [11]KaimingHe,XiangyuZhang,ShaoqingRen,andJianSun. Delvingdeepintorectiﬁers:Surpassinghuman-levelperfor- manceonclassiﬁcation.InICCV,2015.4 [12]KaimingHe,XiangyuZhang,ShaoqingRen,andJianSun. Deepresiduallearningforimagerecognition.InCVPR,2016.
4 [13]GaryBHuang,ManuRamesh,TamaraBerg,andErik Learned-Miller.Labeledfacesinthewild:Adatabase forstudyingfacerecognitioninunconstrainedenvironments. Technicalreport.6,
7 [14]XunHuang,Ming-YuLiu,SergeBelongie,andJanKautz. Multimodalunsupervisedimage-to-imagetranslation.In ECCV,2018.3 [15]JustinJohnson,AlexandreAlahi,andLiFei-Fei.Perceptual lossesforreal-timestyletransferandsuper-resolution.In ECCV,2016.3,
5 [16]AminJourabloo,XiYin,andXiaomingLiu.Attributepre- servedfacedeidentiﬁcation.InInICB,2015.2 [17]TeroKarras,TimoAila,SamuliLaine,andJaakkoLehtinen. Progressivegrowingofgansforimprovedquality,stability, andvariation.InICLR,2018.2,
6 [18]VahidKazemiandJosephineSullivan.Onemillisecondfacealignmentwithanensembleofregressiontrees.InProceed- ingsoftheIEEEconferenceputervisionandpatternrecognition,pages1867–1874,2014.4 [19]IraKemelmacher-Shlizerman.Transﬁguringportraits.ACMTrans.Graph.,35
(4),2016.2 [20]TaeksooKim,MoonsuCha,HyunsooKim,JungkwonLee,andJiwonKim.Learningtodiscovercross-domainrelationswithgenerativeworks.arXivpreprintarXiv:1703.05192,2017.2 [21]DavisEKing.Dlib-ml:Amachinelearningtoolkit.JournalofMachineLearningResearch,10(Jul):1755–1758,2009.4 [22]Kingma,DiederikP.,andJimmyBa.Adam:Amethodforstochasticoptimization.InICLR,2016.6 [23]IrynaKorshunova,WenzheShi,JoniDambre,andLucasTheis.Fastface-swapusingconvolutionalworks.InTheIEEEInternationalConferenceonComputerVision,2017.2,
3 [24]NeerajKumar,AlexanderCBerg,PeterNBelhumeur,andShreeKNayar.Attributeandsimileclassiﬁersforfaceveriﬁcation.InCVPR,2009.6 [25]GuillaumeLampleetal.works:Manipulatingimagesbyslidingattributes.InNIPS,2017.3 [26]Hsin-YingLee,Hung-YuTseng,Jia-BinHuang,ManeeshSingh,andMing-HsuanYang.Diverseimage-to-imagetranslationviadisentangledrepresentations.InTheEuropeanConferenceonComputerVision(ECCV),September2018.3 [27]Ming-YuLiu,ThomasBreuel,andJanKautz.Unsupervisedimage-to-imageworks.InNIPS.2017.2 [28]ZiweiLiu,PingLuo,XiaogangWang,andXiaoouTang.Deeplearningfaceattributesinthewild.InICCV,2015.6 [29]AlirezaMakhzani,JonathonShlens,NavdeepJaitly,IanGoodfellow,andBrendanFrey.Adversarialautoencoders.arXivpreprintarXiv:1511.05644,2015.3 [30]XudongMao,QingLi,HaoranXie,RaymondYKLau,ZhenWang,andStephenPaulSmolley.Leastsquaresgenerativeworks.InICCV,2017.4 [31]BlažMeden,ReﬁkCanMallı,SebastjanFabijan,HazımKemalEkenel,VitomirŠtruc,andPeterPeer.Facedeidentiﬁcationwithgenerativedeepworks.IETSignalProcessing,11
(9):1046–1054,2017.2,7,
8 [32]ElaineMNewton,LatanyaSweeney,andBradleyMalin.Preservingprivacybyde-identifyingfaceimages.IEEEtransactionsonKnowledgeandDataEngineering,17
(2):232–243,2005.2 [33]ElaineMNewton,LatanyaSweeney,andBradleyMalin.Preservingprivacybyde-identifyingfaceimages.IEEEtransactionsonKnowledgeandDataEngineering,17
(2):232–243,2005.2 [34]YuvalNirkin,IacopoMasi,AnhTuanTran,TalHassner,andGerardMedioni.Onfacesegmentation,faceswapping,andfaceperception.arXivpreprintarXiv:1704.06729,2017.2 [35]SeongJoonOh,MarioFritz,andBerntSchiele.Adversarialimageperturbationforprivacyprotectionagametheoryperspective.In2017IEEEInternationalConferenceonComputerVision(ICCV),pages1491–1500.IEEE,2017.8 9386 [36]PJonathonPhillips,JRossBeveridge,BruceADraper,GeofGivens,AliceJO’Toole,DavidSBolme,JosephDunlop,YuiManLui,HassanSahibzada,andSamuelWeimer.Anintroductiontothegood,thebad,&theuglyfacerecognitionchallengeproblem.InAutomaticFace&GestureRecognition,2011.8 [37]AlecRadford,LukeMetz,andSoumithChintala.Unsupervisedrepresentationlearningwithdeepconvolutionalgenerativeworks.arXivpreprintarXiv:1511.06434,2015.2 [38]OlafRonneberger,PhilippFischer,andThomasBrox.:worksforbiomedicalimagesegmentation.InInternationalConferenceonMedicalputingputer-assistedintervention,pages234–241.Springer,2015.4 [39]AndreasRössler,DavideCozzolino,LuisaVerdoliva,ChristianRiess,JustusThies,andMatthiasNießner.FaceForensics:Alarge-scalevideodatasetforerydetectioninhumanfaces.arXiv,2018.6 [40]TimSalimans,IanJ.Goodfellow,WojciechZaremba,VickiCheung,AlecRadford,andXiChen.Improvedtechniquesfortraininggans.arXivpreprintarXiv:1606.03498,2016.2 [41]BrankoSamarzijaandSlobodanRibaric.Anapproachtothede-identiﬁcationoffacesindifferentposes.In201437th InternationalConventiononInformationandCommunicationTechnology,ElectronicsandMicroelectronics(MIPRO),pages1246–1251.IEEE,2014.2,6,7,8[42]FlorianSchroff,DmitryKalenichenko,andJamesPhilbin.:Auniﬁedembeddingforfacerecognitionandclustering.InProceedingsoftheIEEEconferenceputervisionandpatternrecognition,pages815–823,2015.7[43]QianruSun,LiqianMa,SeongJoonOh,LucVanGool,BerntSchiele,andMarioFritz.Naturalandeffectiveobfuscationbyheadinpainting.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition,pages5050–5059,2018.2,7,8[44]QianruSun,AyushTewari,WeipengXu,MarioFritz,ChristianTheobalt,andBerntSchiele.Ahybridmodelforidentityobfuscationbyfacereplacement.InProceedingsoftheEuropeanConferenceonComputerVision(ECCV),pages553–569,2018.2,7,8,12,13[45]YanivTaigman,AdamPolyak,andLiorWolf.Unsupervisedcross-domainimagegeneration.InInternationalConferenceonLearningRepresentations(ICLR),2017.2[46]JustusThies,MichaelZollhofer,MarcStamminger,ChristianTheobalt,andMatthiasNießner.Face2face:Real-timefacecaptureandreenactmentofrgbvideos.InProceedingsofthe IEEEConferenceonComputerVisionandPatternRecognition,pages2387–2395,2016.3[47]DmitryUlyanov,VadimLebedev,VictorLempitsky,etal.works:Feed-forwardsynthesisoftexturesandstylizedimages.InICML,2016.3[48]DmitryUlyanov,AndreaVedaldi,andVictorLempitsky.Instancenormalization:Themissingingredientforfaststylization.arXivpreprintarXiv:1607.08022,2016.4[49]YifanWu,FanYang,andHaibinLing.Privacy-protective-ganforfacede-identiﬁcation.arXivpreprintarXiv:1806.08906,2018.2,7,
8 [50]CihangXieetal.Featuredenoisingforimprovingadversarialrobustness.arXivpreprintarXiv:1812.03411,2018.8 [51]ZiliYi,HaoZhang,PingTan,andMinglunGong.DualGAN:Unsupervisedduallearningforimage-to-imagetranslation.arXivpreprintarXiv:1704.02510,2017.2 [52]HongyiZhang,MoustaphaCisse,YannNDauphin,andDavidLopez-Paz.mixup:Beyondempiricalriskminimization.arXivpreprintarXiv:1710.09412,2017.3,
4 9387

本文地址：https://www.apjn.cn/w/4939/2110.html

声明：该资讯来自于互联网网友发布，如有侵犯您的权益请联系我们。

LiveFaceDe-IdentiﬁcationinVideo,在哪里能看到cba的直播视频

C 66,C66 信息披露

C 154,C154 信息披露