SinaWeibo,Journal

新浪 1
ofComputerandCommunications,2014,2,19-26PublishedOnlineJanuary2014().2014.21004 ApplicationofAssociationRuleMiningTheoryinSinaWeibo XiaoCui,HaoShi,XunYi CollegeofEngineeringandScience,VictoriaUniversity,Melbourne,Australia.Email:xiao.cui1@live.vu.edu.au ReceivedNovember21st,2013;revisedDecember17th,2013;eptedDecember24th,2013 Copyright©2014XiaoCuietal.ThisisanopenessarticledistributedundertheCreativeCommonsAttributionLicense,whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedtheoriginalworkisproperlycited.InordanceoftheCreativeCommonsAttributionLicenseallCopyrights©2014arereservedforSCIRPandtheowneroftheintellectualpropertyXiaoCuietal.AllCopyright©2014areguardedbylawandbySCIRPasaguardian. ABSTRACT Auserprofilecontainsinformationaboutauser.Asubstantialefforthasbeenmadesoastounderstandusers’behaviorthroughanalyzingtheirprofiledata.Onlineworksprovideanenormousamountofsuchinformationforresearchers.SinaWeibo,aTwitter-likemicrobloggingplatform,hasachievedagreatessinChinaalthoughstudiesonitarestillinaninitialstate.ThispaperaimstoexploretherelationshipsamongdifferentprofileattributesinSinaWeibo.Weusethetechniquesofassociationruleminingtoidentifythedependencyamongtheattributesandwefoundthatifauser’spostsareed,heorsheismorelikelytohavealargenumberoffollowers.Ourresultsdemonstratehowtherelationshipsamongtheprofileattributesareaffectedbyauser’sverifiedtype.Wealsoputsomeeffortsondatatransformationandanalyzetheinfluenceofthestatisticalpropertiesofthedatadistributionondatadiscretization. KEYWORDS AssociationRules;UserProfiles;SinaWeibo;SocialNetwork
1.Introduction OnlineworkssuchasFacebook,TwitterandGoogle+haveeanintegralpartofpeople’sdailylives.Nomatterhowtheydifferentiatefromoneanother,userprofilesareakeyfeature.Auserprofilemayincludebutnotbelimitedtogender,age,location,upation,socialcontacts,etc.Theavailabilityoftheinformationmayvaryfromonesitetoanother.Inspiteofthefactthatuserprofilesarelessdynamicthanotheronlinebehaviors,theystillprovideaclearsignalofusers’characteristics.Asubstantialefforthasbeenmaderecentlyinordertoobtainknowledgeaboutusersfromtheirprofiledata.Lampeetal.[1]foundthatpletionpercentageonFacebookhasapositiverelationshipwiththenumberoffriendsauserhas.Misloveetal.[2]proposedanalgorithmtoinferthemissingpartofauserprofileordingtoothersimilarprofiles.Querciaetal.[3]conductedastudyontherelationshipbetweentheBigFivepersonalitytraitsanduserbehaviorsonTwitter.Theyintroducedanovelmethodtopredictthepersonalitybasedonthe numberoffollowers,followingsandtweetsauserhas.AsTwitterisbannedinChina,SinaWeiboisconsi- deredareplacementforit.SinaWeibohasreached56milliondailyactiveusers(whospendanaverageofonehourperdaywiththeservice)[4].SinaWeibohashadasignificantinfluenceonChinesesociety.Unlikeitspredecessors,studiesonSinaWeiboarestillinaninitialstate.ThereareafewstudiesonSinaWeibowithregardtouserprofiles.Guoetal.[5]foundthattheconnectionsbetweenusersaremostlyone-wayandthenumberoffollowersauserhaschangesveryfast.ChenandShe[6]carriedoutasimilarstudyparedverifieduserswithunverifiedones.Theybelievedthatuserswhoserealidentityhasbeenverifiedaremorelikelytohavegreaterinfluence.Wangetal.[7]examinedthecorrelationbetweenthenumberoffollowers,followingsandposts.Theyfoundthatthenumberoffollowersgrowsrapidlyasthenumberoffollowingsincreasesfrom10to3000.Theyalsostatedthattheincreaseinpostscanleadtomorefollowersaslongasthenumberofpostsdoes OPENACCESS JCC 20 ApplicationofAssociationRuleMiningTheoryinSinaWeibo notexceed20,000.AlthoughconsiderableattentionhasbeenpaidtoSina Weibo,associationsamongdifferentprofileattributes,suchastheassociationbetweenthenumberofrepostsments,havenotbeenwellexaminedyet.DuetothefactthatalargenumberofusersonSinaWeibohavebeenverifiedordingtotheirprofessionalbackground,peopleonSinaWeiboaremorelikelytoactresponsiblyandengagehonestlywithmunity.Itisworthwhiletoexploreusers’characteristicsonSinaWeiboespeciallyconsideringtheyhavedifferentverifiedtypes(e.g.localauthorities,newsagency,andcelebrity).Ourresearchisbasedonasetoffirst-handdatacollectedfromSinaWeibo,containing1,192,972users’profiles.Themajorcontributionsaresummarizedasfollows:•Continuousdata(e.g.thenumberoffollowers)are replacedbymeaningfullabels(e.g.thegrassrootsandsocialstar).•Theinfluenceofthedistributionofthedataondatadiscretizationisanalyzed.•Associationruleminingisconductedwithrespecttousers’verifiedtypes.•parisonbetweendifferenttypesofusersismade.Therestofthepaperanizedasfollows.Section2presentsthedatamodelusedinthispaper.Definitionssuchasthenumberoffollowingsauserhasaregiven.Section3explainstheprocessofdatacollection.ThesocialrelationshipsamongusersinSinaWeiboareillustrated.Section4discussesthemethodsfordatadiscretization.Thestatisticalpropertiesofthedatadistributionaretakenintoconsideration.Section5introducesanApriori-basedmethodforassociationruleminingandexplainshowwearegoingtoconducttheassociationrulemininginSinaWeibo.EmpiricalresultsaregiveninSection6andconclusionsaredrawninthelastsection.
2.DataModel Theinformationinauserprofilemayincludevariousattributesofausersuchasgeographicallocation,academicandprofessionalbackground,interests,preferences,etc.Theavailabilityofsuchinformationmayvaryfromonesitetoanother.Intermsofmicroblogging,i.e.SinaWeibo,thenumberoffollowers,followingsandpostsauserhasarethreeindispensablepartsofauserprofile.Suchinformationisalwaysdisplayedataprominentplace.Besides,averifiedtypeisaddedtoauserprofileasusersonSinaWeibomaychoosetoverifytheiridentitybasedontheirprofessionalbackground.Inthispaper,auserprofileisdefinedasfollows: profile(uid)={username,province,gender,numberoffollowers,numberoffollowings,numberofposts,numberofreposts,numberments,verifiedtype,timesincecreated} Eachuserhasauniqueidentificationnumber(uid).Thecoreattributesofaprofilearedefinedasfollows:•NoAreferstonumberoffollowers.NoA(uid)isthe totalnumberofaudiencewhoarelisteningtothebroadcastofuseruid.NoAisoneofthemajorsignsofauser’spopularity.•NoBreferstonumberoffollowings.NoB(uid)isthetotalnumberofbroadcaststowhichuseruidislistening.SinaWeiboenforcesthatausercanlistenamaximumof2000broadcasts.•NoPreferstonumberofposts.NoP(uid)isthetotalnumberofpoststhatuseruidupdates.NoPcanbeagoodindicatorofauser’sactiveness.•NoRreferstonumberofreposts.NoR(uid)isthetotalnumberofrepoststhatothersforwardfromuseruid.NoRisasignofthecapabilityauserhastospreadouttheinformation.•NoCreferstonumberments.NoC(uid)isthetotalnumbermentsothersleaveonuseruid.NoCcanrevealthelikelihoodofausertoinitiateaic.•VTreferstoverifiedtype.VTincludesredstar(anordinaryuserwhoserealidentityisverified),beauty,e-celebrity,corporation,government,media,anization,campus,applicationsoftware,andwebsite[8].Forexample,userXinhuaNewsAgency,theofficialpressagencyofChina,isclassedasmedia.•TsCreferstotimesincecreated.
3.DataCollection Users’profilesarecollectedthroughtheRESTAPIprovidedbySinaWeibo.Bilateralrelationshipsareusedtoexpandthesearchofnewusers.Socialrelationshipsamongusersaredefinedasfollows(seeFigure1). Scenario1indicatesthatuid1anduid2havenoconnectionbetweenthem.Scenario2showsthatuid1isafollowerofuid2.Scenario3explainsbilateralfriendshipswhereuid1isafollowerofuid2anduid2isalsoafollowerofuid1.Weassumethatistwousersfolloweachother,theyareconsideredfriends. Figure1.Socialrelationships. OPENACCESS JCC ApplicationofAssociationRuleMiningTheoryinSinaWeibo 21 Gettingthefriendsofafriendisthestrategyusedinthispapertoobtainusers’idsfromSinaWeibo.TheRESTAPIprovidesfacilitiestoretrieveprofileinformationordingtoauser’sid.Theimplementationdetailsaregivenbelow(seeTable1). Unlikestudies[4-6]whereauser’sfollowingsareusedtoexpandthesearchofnewusers,bilateralrelationshipsareusedinthisstudy.Userswhofolloweachotherseemtohaveacloserrelationbetweenthem.Thismethodcanpreventthesearchofnewusersfromthespammersbecausenoonelikestosubscribeaspammer’smicroblog. Finally,1,192,972users’profilesareretrieved.39.58%ofthemareverifiedusers.Redstarande-celebrityountfor91.08%ofverifiedusers(seeFigure2).
4.DataDiscretization Dataminingprocessinvolvesapreprocessingstepinordertoassurethedatahavethequalityandtheformatrequiredbythealgorithm.Usersareclassifiedbytheirattributes.Forexample,ordingtoNoA,usersareclassifiedintotwogroups:thegrassrootsandsocialstar.Usersinthelattergrouphavemuchmorefollowersthanusersintheformerone.Othercontinuousdataarereplacedaswellinasimilarway(seeTable2). 4.1.TheK-MeansMethodandtheParetoPrinciple Thispaperexperimentedwithtwomethods:thek-meansclusteringalgorithmandtheParetoprinciple.Thepurposeofclusteringistosearchforsimilarexamplesandgroupthemintoclusterssuchthatthedistancebetweenexampleswithinclusterisassmallaspossibleandthedistancebetweenclustersisaslargeaspossible[9].Let P={p1,p2,,pn}beasetofdatapointstobeclus- teredandkisthenumberofclusters(Here,k=2).RandomlyselectkdatapointsfromPastheinitial centroidsoftheclusters,C={c1,c2}.Then,following stepsarerepeatedlyperformeduntiltheconvergenceis obtained:1)Assigneachdatapointpi∈P,i={1,
2,,n} totheclosestcentroideitherc1orc2.2)pute thecentroidsoftheclustersC={c1,c2}.Centroidisthe meanofthepointsincluster.TheParetoprinciple(alsoknownasthe80-20rule) [10]originallyreferredtotheobservationthat80%ofItaly’swealthbelongedtoonly20%ofthepopulation.Here,weassumethat,forexample,auserwhosefollowersaremorethan80%oftheotherusersisclassedassocialstar.Thequantilefunctionusedtocalculatethecutpointsbetweenthegroups(e.g.thegrassrootsandsocialstar),isdefinedasfollows[11]: F−1(p)=min{X∈:F(X)≥p},p=0.8
(1) Table1.Pseudocodefordatacollection. enqueueiinqwhileqisnotemptydo getfriends_uidsordingtoiforeachjinfriends_uidsdo ifjdoesnotexistinqtheninsertjintoq Here,XmayrefertooneofthevariablesinTable2.ThedistributionfunctionofXisgivenby F=(X)P(X≤x)whereF(X)representstheprob- abilitythatXislessthanorequaltox.Equation
(1) determinestheplacewhere80%ofthedataliesbelowit, e.g.80%ofNoAislessthanorequalto1140%and80% ofNoRislessthanorequalto294. endenddequeuekfromqgetprofileordingtokseti=kend 4.2.DiscretizationIndex Inthispaper,adiscretizationindex(di)isproposedto measurethequalityofthediscretizationproducedfrom abovemethods.LetX={x1,x1,,xn}beasetofdata pointstobesplit.SupposeXispartitionedintotwo groupsG={g1,g2}.Adiisdefinedasfollows: ∑∑()()=di maxi∈{1,2} δ2i=j1 x∈gxk−µj
(2) kj Figure2.Theproportionofusers. whereδidenotestheproportionofgiinXandµjdenotesthemeanofdatapointsingj.Themethodwiththesmallestdiisconsideredthebestmethodbasedonthefollowingcriteria:1)Minimizethedistanceswithintheclustersandmaximizethedistancesbetweentheclusters.2)Splitasequallyaspossible.Thereasonwhybothcriteriaareneededisthatusingthefirstcriterion(i.e.,clustersthatarecoherentinternallybutclearlydifferentfromeachother)alonetosplitthedatamaycauseanextremelyunevenpartition(seeSection4.3).Asasso- OPENACCESS JCC 22 ApplicationofAssociationRuleMiningTheoryinSinaWeibo Table2.Datadiscretization. ContinuousData StatisticalPropertiesStandarddeviationSkewness Class PartitionQuantitya Intervalb di(109) K-means Pareto Thegrassroots1,192,865 [1.1140) NoA 219583.30 116.64 15.31 12.67 Socialstar 107 [1140.63717128] Self-centered 1,029,534 [1.894) NoB 485.80 2.00 0.19 8.06 Scout 163,438 [894.2000] Lurker 1,152,976 [1.2034) NoP 3032.53 12.25 12.36 0.88 Blog
zealot 39,996 [2034.413549] Valuelss 1,192,897 [0.294) NoR 62489.64 131.02 4.10 3.44 Propagator 75 [294.18645439] Uninterested 1,192,939 [0.206) NoC 56012.94 353.34 3.05 2.22 Topic
inititator 33 [1206.39344085] aCalculationwasbasedonthepartitionsgeneratedfromthek-meansmethod;bCalculationwasbasedonthepartitionsgeneratedfromtheParetoprinciple. ciationrulesaregeneratedfromfrequentitemsets(seeSection5),dataintheminority,forexample,200socialstarusersin1,192,972users,areverylikelytobeoverlooked.MoreexplanationsforwhypartitioningasevenlyaspossibleisimportanttothisstudyaregiveninSection5.Weproposediaimingtobuildabalancebetweenthecriteria. 4.3.ComparisonbetweentheMethods Wefoundthattheuseofdiscretizationmethodsdependsonthestatisticalpropertiesofthedatadistribution.Thespreadofthedata(i.e.standarddeviation)andthesymmetryofthedata(i.e.skewness)mayhavesignificantinfluenceontheperformanceofthediscretization.Higherstandarddeviationimpliesgreaterspreadofdata.Positivevaluesfortheskewnessindicatethatthedistributionisskewedright.Higherskewnessimplieslongertailintherightside.Anormaldistributionhasaskewnessof0.Wefoundthatthek-meansmethodisverygoodatcreatingclusterscoherentinternallybutdifferentfromeachother.However,thek-meansmethodtendstopartitiondatainanextremelyunevenwaywhenthedistributionisskewed(seeTable2).Ontheotherhand,partitionbasedontheParetoprincipleproducesalowerdiinmostcases(seeTable2).Dataarepartitionedina80-20waywithoutimpairingtheinternalcoherenceandtheexternaldifferenceoftheclusters. Weuseexamplestoillustratehowthestatisticalpropertiesofthedatadistributioncanhaveinfluenceonthedatadiscretization.AsshowninFigure3,thedistributionofNoBismuchclosertoanormaldistributionwithaskewnessof2,atthesametimeithastheloweststandardparedwithothervariables.Inthiscase,thek-meansmethodproducesalowerdithantheParetoprinciple.parison,datapointsinNoAarespreadoutoveranextremelylargerangeofvalues1to63,717,128.Askewnessof116.64indicatesthatthedistributionof NoAhasaverylongtailattherightside(seeFigure3).Asaconsequence,themajorityofdatapointsinNoAfallwithinaverysmallrangeofvaluesandveryfewofdatapointsfallwithinanextremelywiderangeofvalues.Actually,80%ofthedatapointsinNoAfallwithintheinterval[1,1140)andtherestfallswithintheinterval[1140,63,717,128].Inthiscase,thek-meansmethodtendstogroupalmostalldatapointsintooneclusterandputtherestintoanotherone.Actually,only0.01%ofuserswereclassedassocialstarinthek-meansmethod(seeTable2).PartitionbasedontheParetoprincipleisappliedinthisstudybecauseitmakesatrade-offbetweenthecriteria.
5.MiningAssociationRulesinSinaWeibo 5.1.AssociationRuleMining Theassociationruleminingcanbeconceptualizedas follows[9]:Letf={I1,I2,,In}bethesetofallitems. LetDBbeasetofdatabasetransactionswhereeachtransactionTisasetofitemssuchthatT⊆f.LetAbeasetofitems.AtransactionissaidtocontainAifandonlyifA⊆f.Anassociationruleisanimplica- tionoftheformA⇒B[s,c,l],whereA⊆f,B⊆f, AB=∅.Thesupports,confidencecandliftloftheruleA⇒Baredefinedas: =sP=(AB)F(AB)DB
(2) =cP=(BA)F(AB)F(A)
(3) l=P(AB)P(A)P(B)
(4) whereF(A)standsforthenumberoftransactions containingthesetXinDBandDBdenotesthetotalnumberoftransactionsinDB.Ruleswiththe supportmorethanaminimumsupportthresholdsmin andtheconfidencemorethanaminimumconfidencethresholdcminarecalledstrong.Asetofitemsisrefe- OPENACCESS JCC ApplicationofAssociationRuleMiningTheoryinSinaWeibo 23 (a) (b) (c) (d) (e) (f) Figure3.Datadistribution. reedasanitemset.Anitemsetthatcontains(k)itemsisak-itemset.Thesupportcountofanitemsetisthenumberoftransactionscontainingtheitemset.Theminimumsupportcountisdefinedassmin⋅DB.Anitemsetisfrequentifitssupportcountisnotlessthantheminimumsupportcount. 5.2.AprioriAlgorithm Aprioriisaninfluentialalgorithmforminingfrequent itemsetsforBooleanassociationrules.Thenameofthe algorithmisbasedonthefactthatthealgorithmusesthe Aprioriproperty,i.e.,allnonemptysubsetsofafrequent itemsetmuchalsobefrequent.LetLibethesetoffrequenti-itemsets.GivenLk−
1,ApriorialgorithmfindsLkusingjoinandpruneactionsasfollows:1)Join:TofindLk,asetofcandidatek-itemsets,denotedaslkisgeneratedbyjoiningLk−1withitself.Anytwo(k−1)-itemsetsAandBarejoinableiftheycontain(k− OPENACCESS JCC 24 ApplicationofAssociationRuleMiningTheoryinSinaWeibo {}2)monitems.Forexample,A=x1,,x(k−2),x(k−1){}andB=x1,,x(k−2),xkarejoinable.Theresulting{}candidatek-itemsetisx1,,x(k−2),x(k−1),xk.2)Prune: lkcanbehuge.Toreducethesizeoflk,theAprioripropertyisusedasfollows.Ifany(k−1)-subsetofancandidatek-itemsetisnotinLk−1,thecandidatecannotbefrequenteitherandsocanberemovedfromlk.ThesetofremainingcandidatesinlkisasupersetofLi,thatis,itselementsmayormaynotbefrequent,butallofthefrequentk-itemsetsmustbeincludedinlk.AscanofthedatabasetodeterminethecountofeachcandidateinlkwouldresultinthedeterminationofLk,i.e.,allcandidatehavingacountnolessthantheminimumsupportcountarefrequentandthereforebelongtoLk.ByApriorialgorithm,allfrequentitemsetsalongwiththeirsupportcountscanbefoundefficiently. 5.3.ExperimentalDesign SupposedatasetUcontainsallthedatacollectedfromSinaWeibo.AssociationrulesareminedfrombothUanditssubsets. Consideringthepropertyofassociationruleminingdescribedabove,raretypesofusersareverylikelytobeprunedduetotheirrelativelylowsupportcounts.SplittingUintodisjointsubsetsbasedonVTandminingassociationrulesfromthemseparatelyisnecessarysoastoavoidoverlookingsomeinterestingpatternsthatarehiddenintheraretypesofusers.ThedatasetUisdividedinto2subsets:verified_ountsandunverified_ounts.parisonintermsofassociationrules,betweenverified_ountsandunverified_ounts,ismadetoidentifythedifferencebetweenverifiedusersandunverifiedusers.Ifnecessary,thedatasetverified_ountscanbefurtherdividedordingtoVT.Inthispaper,parisonbetweenredstarande-celebrityisconductedfortworeasons:1)redstarande-celebritytogetherountfor91.08%ofverifiedusers.2)redstarreferstothemasses,oppositetoe-celebritywhoarepublicfiguresandprofessionalsandwellknowninmunities. Consideringthefactthatuserswhohavelargenumberoffollowers(followings,posts,reposts,ments)onlyountforaverysmallpart,wehavetogivearel- ativelysmallsminsoastoassuretherulesforsocial star(scout,blogzealot,propagator,icinitiator)canbeelicited.Associationrulesaresortedbyliftvalues.Aliftequalsto1meanstheurrenceofAisindependentoftheurrenceofBifanassociationruleis intheformofA⇒B[s,c,l].Aliftisgreaterthan1 indicatesthattheurrenceofAhasapositiveeffectontheurrenceofB.Weareinterestedintheprofileattributeswhicharedependentoneachother.
6.EmpiricalResults WefoundthatbothNoRandNoCplayimportantrolesinauser’spopularity(seeFigure4).Ifauser’spostsareed,eitherthepostsareforwardedbymanytimesormanypeoplementsabouttheposts,theownerofthepostsismuchmorelikelytobetaggedasasocialstar.AnotherfindingwasthatNoCispositivelycorrelatedwithNoR.3of4rulesfor“NoR=Propagator”wereattributedto“NoC=Topicinitiator”(i.e.Rule#3,5,and6).Also,ane-celebrityuserisalwayspaniedbyalargenumberoffollowers(socialstar).Thus,socialstarisagoodindicatorofe-celebrity.Wefoundthata5-yearsocialstaruserisane-celebritywithaconfidenceof54.16%. parisonbetweenassociationrulesderivedfromunverified_ountsandverified_ountswasmade(seeFigures5and6). ApositivecorrelationbetweenNoCandNoRexistsinbothofthem;however,rulesinunverified_ountshavehigherliftthanthatinverified_ounts.Inotherwords,NoCandNoRaremoredependentononeanotherinunverified_ounts.ordingtoabovefindings,wecouldstatethat,foraverifieduser,anincreaseinNoCmaynotenhancetheprobabilityofanincreaseinNoR.Ontheotherhand,foranordinaryuserwhohasnotbeenverifiedyet,sayingsomethingcontroversialtoreceivements(NoC)isagoodwaytoincreasetherateof Figure4.Top10rulesderivedfromU. OPENACCESS JCC ApplicationofAssociationRuleMiningTheoryinSinaWeibo 25 diffusionofhisorherposts(NoR).Actually,ithasalreadyhappenedinmanyonlineworkswherepeopleinitiatesomeicsinordertoefamous[12]. Althoughred_starisadisjointsubsetofverified_ounts,thedependence,intermsoflift,betweenNoCandNoRinred_starismuchstrongerthanthatinverified_ountsitself(seeFigure7).Ontheotherhand,rulesderivedfrome-celebrityarelessinterestinginterms oflift(seeFigure8).Wefoundthatifane-celebrityuser’spostsarewel- comed,thenheorsheisablogzealotwithaconfidencegreaterthan65%.Actually,ithappensinmanykindsofusertypes.Unlikeredstarusers,userssuchascorporation,media,andapplicationsoftware,haveastrongmotiveforpromotingthemselvesorsomethingelse.Asaconsequence,theyarelikelytosendasmanymessagesaspossible.Atthesametime,duetotheirhighreputation, Figure5.Top10rulesderivedfromunverified_ounts. Figure6.Top10rulesderivedfromverified_ounts. Figure7.Top5rulesderivedfromred_star. Figure8.Top5rulesderivedfrome-celebrity. OPENACCESS JCC 26 ApplicationofAssociationRuleMiningTheoryinSinaWeibo otherusersprefertoforwardtheirpostsorhavediscussionwiththem.Postsareedisindependentofhavingalargenumberofposts.Forthisreason,liftvaluesinFigure8areverycloseto1.
7.Conclusion Inthisstudy,weexploredtherelationshipsamongdifferentprofileattributesthroughthetechniquesofassociationrulemining.Wefoundthatauserismorelikelytohavealargenumberoffollowers(NoA)ifhisorherpostsareforwardedbymanytimes(NoR)ormanypeoplegetinvolvedinthediscussionheorsheinitiated(NoC).OurresultsindicatethatNoRandNoCarestronglydependentoneachotherwithrespecttoordinaryusers(bothunverifiedusersandredstarusers).Profileattributesforverifiedusersarerelativelyindependentononeanother.Wealsoexaminedboththek-meansmethodandtheParetoprincipleasamethodfordatadiscretization.Wefoundthatthestatisticalpropertiesofdatadistributioncanhavesignificantinfluenceondatadiscretization.Duetothefactthatdatausedinthisstudyareskewedheavily,wesuggestedusingtheParetoprincipletopartitiondata. REFERENCES [1]
C.A.Lampe,
N.EllisonandC.Steinfield,“AFamiliarFace(Book):ProfileElementsasSignalsinanOnlineSocialNetwork,”ProceedingsoftheSIGCHIConferenceonHumanFactorsinComputingSystems,2007,pp.435444. [2]
A.Mislove,
B.Viswanath,
K.P.GummadiandP.Druschel,“YouAreWhoYouKnow:InferringUserProfilesinOnlineSocialNetworks,”ProceedingsoftheThirdACMInternationalConferenceonWebSearchandDataMining,2010,pp.251-260. [3]
D.Quercia,
M.Kosinski,
D.StillwellandJ.Crowcroft,“OurTwitterProfiles,OurSelves:PredictingPersonalitywithTwitter,”2011IEEEThirdInternationalConferenceonPrivacy,Security,RiskandTrust(Passat)and2011IEEEThirdInternationalConferenceonSocialComputing(),2011,pp.180-185. [4]
D.Clark,
R.CrandallandY.Mei,“4thAnnualChina2.0ConferenceUnderscoresBusinessInnovation,SocialImpactandU.S.-ChinaLinks,”2013.http://sprie.gsb.stanford.edu/news/4th_annual_china_20_confe-rence_underscores_business_innovation_social_impact_and_uschina_links_20131022/ [5]
Z.Guo,
Z.Li,
H.TuandL.Li,“CharacterizingUserBehaviorinWeibo,”2012ThirdFTRAInternationalConferenceonMobile,Ubiquitous,andIntelligentComputing(MUSIC),2012,pp.60-65. [6]
J.ChenandJShe,“AnAnalysisofVerificationsinMicrobloggingSocialNetworks—SinaWeibo,”201232ndInternationalConferenceonDistributedComputingSystemsWorkshops(ICDCSW),2012,pp.147-154. [7]
C.Wang,
X.Guan,
T.QinandW.Li,“WhoAreActive?
AnIn-DepthMeasurementonUserActivityCharacteristicsinSinaMicroblogging,”GlobalCommunicationsConference(GLOBECOM),2012,pp.2083-2088. [8]SinaOpenAPI./wiki/ [9]
J.Han,
M.KamberandJ.Pei,“DataMining:ConceptsandTechniques,”anKaufmann,2006. [10]
J.M.JuranandA.B.Godfrey,“Juran’sQualityHandbook(Vol.2),”McGrawHill,NewYork,1999. [11]
I.FrohneandR.J.Hyndman,“SampleQuantiles,”RProject,2009. [12]
J.Feng,“Romancingthe:ProducingandConsumingChineseWebRomance,”Brill,2013./10.1163/9789004259720 OPENACCESS JCC

标签: #微博悄悄关注在哪里 #网上购物 #方式 #积分 #微信分组在哪里设置 #文件 #好友 #公众