Поддерживать
www.wikidata.ru-ru.nina.az
Hadoop proekt fonda Apache Software Foundation svobodno rasprostranyaemyj nabor utilit bibliotek i frejmvork dlya razrabotki i vypolneniya raspredelyonnyh programm rabotayushih na klasterah iz soten i tysyach uzlov Ispolzuetsya dlya realizacii poiskovyh i kontekstnyh mehanizmov mnogih vysokonagruzhennyh veb sajtov v tom chisle dlya Yahoo i Facebook Razrabotan na Java v ramkah vychislitelnoj paradigmy MapReduce soglasno kotoroj prilozhenie razdelyaetsya na bolshoe kolichestvo odinakovyh elementarnyh zadanij vypolnimyh na uzlah klastera i estestvennym obrazom svodimyh v konechnyj rezultat Apache HadoopTip frejmvorkAvtory Dug Katting vd i Majk Kafarela vd Razrabotchik Apache Software FoundationNapisana na JavaOperacionnye sistemy krossplatformennost i POSIXPervyj vypusk 1 aprelya 2006Apparatnaya platforma Java Virtual MachinePoslednyaya versiya 3 4 0 17 marta 2024 Repozitorij git wip us apache org re gitbox apache org repos github com apache hadoopLicenziya Apache License 2 0 i GNU GPLSajt hadoop apache org angl Mediafajly na Vikisklade Po sostoyaniyu na 2014 god proekt sostoit iz chetyryoh modulej Hadoop Common svyazuyushee programmnoe obespechenie nabor infrastrukturnyh programmnyh bibliotek i utilit ispolzuemyh dlya drugih modulej i rodstvennyh proektov HDFS raspredelyonnaya fajlovaya sistema YARN sistema dlya planirovaniya zadanij i upravleniya klasterom i Hadoop MapReduce platforma programmirovaniya i vypolneniya raspredelyonnyh MapReduce vychislenij ranee v Hadoop vhodil celyj ryad drugih proektov stavshih samostoyatelnymi v ramkah sistemy proektov Apache Software Foundation Schitaetsya odnoj iz osnovopolagayushih tehnologij bolshih dannyh Vokrug Hadoop obrazovalas celaya ekosistema iz svyazannyh proektov i tehnologij mnogie iz kotoryh razvivalis iznachalno v ramkah proekta a vposledstvii stali samostoyatelnymi So vtoroj poloviny 2000 h godov idyot process aktivnoj kommercializacii tehnologii neskolko kompanij stroyat biznes celikom na sozdanii kommercheskih distributivov Hadoop i uslug po tehnicheskoj podderzhke ekosistemy a prakticheski vse krupnye postavshiki informacionnyh tehnologij dlya organizacij v tom ili inom vide vklyuchayut Hadoop v produktovye strategii i linejki reshenij IstoriyaRazrabotka byla iniciirovana v nachale 2005 goda angl Doug Cutting s celyu postroeniya programmnoj infrastruktury raspredelyonnyh vychislenij dlya proekta Nutch svobodnoj programmnoj poiskovoj mashiny na Java eyo idejnoj osnovoj stala publikaciya sotrudnikov Google Dzheffri Dina i Sanzhaya Gemavata o vychislitelnoj koncepcii MapReduce Novyj proekt byl nazvan v chest igrushechnogo slonyonka rebyonka osnovatelya proekta V techenie 2005 2006 godov Hadoop razvivalsya usiliyami dvuh razrabotchikov Kattinga i Majka Kafarelly Mike Cafarella v rezhime chastichnoj zanyatosti snachala v ramkah proekta Nutch zatem proekta Lucene V korporaciya Yahoo priglasila Kattinga vozglavit specialno vydelennuyu komandu razrabotki infrastruktury raspredelyonnyh vychislenij k etomu zhe momentu otnositsya vydelenie Hadoop v otdelnyj proekt V fevrale 2008 goda Yahoo zapustila klasternuyu poiskovuyu mashinu na 10 tys processornyh yader upravlyaemuyu sredstvami Hadoop V yanvare 2008 goda Hadoop stanovitsya proektom verhnego urovnya sistemy proektov Apache Software Foundation V aprele 2008 goda Hadoop pobil mirovoj rekord proizvoditelnosti v standartizovannom benchmarke sortirovki dannyh 1 Tbajt byl obrabotan za 209 sek na klastere iz 910 uzlov S etogo momenta nachinaetsya shirokoe primenenie Hadoop za predelami Yahoo tehnologiyu dlya svoih sajtov vnedryayut Last fm Facebook The New York Times provoditsya adaptaciya dlya zapuska Hadoop v oblakah Amazon EC2 V aprele 2010 goda korporaciya Google predostavila Apache Software Foundation prava na ispolzovanie tehnologii MapReduce cherez tri mesyaca posle eyo zashity v patentnom byuro SShA tem samym izbaviv organizaciyu ot vozmozhnyh patentnyh pretenzij Nachinaya s 2010 goda Hadoop neodnokratno harakterizuetsya kak klyuchevaya tehnologiya bolshih dannyh prognoziruetsya ego shirokoe rasprostranenie dlya massovo parallelnoj obrabotki dannyh i naryadu s Cloudera poyavilas seriya tehnologicheskih startapov celikom orientirovannyh na kommercializaciyu Hadoop V techenie 2010 goda neskolko podproektov Hadoop HBase Hive Zookeeper posledovatelno stali proektami verhnego urovnya fonda Apache chto posluzhilo nachalom formirovaniya ekosistemy vokrug Hadoop V marte 2011 goda Hadoop udostoen ezhegodnoj innovacionnoj nagrady mediagruppy Guardian na ceremonii vrucheniya tehnologiya byla nazvana shvejcarskim armejskim nozhom XXI veka Realizaciya v vyshedshem osenyu 2013 goda Hadoop 2 0 modulya YARN ocenena kak znachitelnyj skachok vyvodyashij Hadoop za ramki paradigmy MapReduce i stavyashaya tehnologiyu na uroven universalnogo resheniya dlya organizacii raspredelyonnoj obrabotki dannyh Hadoop CommonV Hadoop Common vhodyat biblioteki upravleniya fajlovymi sistemami podderzhivaemymi Hadoop i scenarii sozdaniya neobhodimoj infrastruktury i upravleniya raspredelyonnoj obrabotkoj dlya udobstva vypolneniya kotoryh sozdan specializirovannyj uproshyonnyj interpretator komandnoj stroki FS shell filesystem shell zapuskaemyj iz obolochki operacionnoj sistemy komandoj vida hdfs dfs i command i i URI i gde i command i komanda interpretatora a i URI i spisok resursov s prefiksami ukazyvayushimi tip podderzhivaemoj fajlovoj sistemy naprimer hdfs example com file1 ili file tmp local file2 Bo lshaya chast komand interpretatora realizovana po analogii s sootvetstvuyushimi komandami Unix takovy naprimer a href wiki Cat title Cat cat a a href wiki Chmod title Chmod chmod a a href wiki Chown title Chown chown a a href wiki Chgrp title Chgrp chgrp a a href wiki Cp title Cp cp a a href wiki Du title Du du a a href wiki Ls title Ls ls a a href wiki Mkdir title Mkdir mkdir a a href wiki Mv title Mv mv a a href wiki Rm title Rm rm a a href wiki Tail title Tail tail a pritom podderzhany nekotorye klyuchi analogichnyh Unix komand naprimer klyuch rekursivnosti R dlya chmod chown chgrp est komandy specificheskie dlya Hadoop naprimer count podschityvaet kolichestvo katalogov fajlov i bajtov po zadannomu puti expunge ochishaet korzinu a setrep modificiruet koefficient replikacii dlya zadannogo resursa HDFSHDFS Hadoop Distributed File System fajlovaya sistema prednaznachennaya dlya hraneniya fajlov bolshih razmerov poblochno raspredelyonnyh mezhdu uzlami vychislitelnogo klastera Vse bloki v HDFS krome poslednego bloka fajla imeyut odinakovyj razmer i kazhdyj blok mozhet byt razmeshyon na neskolkih uzlah razmer bloka i koefficient replikacii kolichestvo uzlov na kotoryh dolzhen byt razmeshyon kazhdyj blok opredelyayutsya v nastrojkah na urovne fajla Blagodarya replikacii obespechivaetsya ustojchivost raspredelyonnoj sistemy k otkazam otdelnyh uzlov Fajly v HDFS mogut byt zapisany lish odnazhdy modifikaciya ne podderzhivaetsya a zapis v fajl v odno vremya mozhet vesti tolko odin process Organizaciya fajlov v prostranstve imyon tradicionnaya ierarhicheskaya est kornevoj katalog podderzhivaetsya vlozhenie katalogov v odnom kataloge mogut raspolagatsya i fajly i drugie katalogi Razvyortyvanie ekzemplyara HDFS predusmatrivaet nalichie centralnogo uzla imyon angl name node hranyashego metadannye fajlovoj sistemy i metainformaciyu o raspredelenii blokov i serii uzlov dannyh angl data node neposredstvenno hranyashih bloki fajlov Uzel imyon otvechaet za obrabotku operacij urovnya fajlov i katalogov otkrytie i zakrytie fajlov manipulyaciya s katalogami uzly dannyh neposredstvenno otrabatyvayut operacii po zapisi i chteniyu dannyh Uzel imyon i uzly dannyh snabzhayutsya veb serverami otobrazhayushimi tekushij status uzlov i pozvolyayushimi prosmatrivat soderzhimoe fajlovoj sistemy Administrativnye funkcii dostupny iz interfejsa komandnoj stroki HDFS yavlyaetsya neotemlemoj chastyu proekta odnako Hadoop podderzhivaet rabotu i s drugimi raspredelyonnymi fajlovymi sistemami bez ispolzovaniya HDFS podderzhka Amazon S3 i CloudStore realizovana v osnovnom distributive S drugoj storony HDFS mozhet ispolzovatsya ne tolko dlya zapuska MapReduce zadanij no i kak raspredelyonnaya fajlovaya sistema obshego naznacheniya v chastnosti poverh neyo realizovana raspredelyonnaya NoSQL SUBD HBase v eyo srede rabotaet masshtabiruemaya sistema mashinnogo obucheniya Apache Mahout YARNYARN angl Yet Another Resource Negotiator eshyo odin resursnyj posrednik modul poyavivshijsya s versiej 2 0 2013 otvechayushij za upravlenie resursami klasterov i planirovanie zadanij Esli v predydushih vypuskah eta funkciya byla integrirovana v modul MapReduce gde byla realizovana edinym komponentom JobTracker to v YARN funkcioniruet logicheski samostoyatelnyj demon planirovshik resursov ResourceManager abstragiruyushij vse vychislitelnye resursy klastera i upravlyayushij ih predostavleniem prilozheniyam raspredelyonnoj obrabotki Rabotat pod upravleniem YARN mogut kak MapReduce programmy tak i lyubye drugie raspredelyonnye prilozheniya podderzhivayushie sootvetstvuyushie programmnye interfejsy YARN obespechivaet vozmozhnost parallelnogo vypolneniya neskolkih razlichnyh zadach v ramkah klastera i ih izolyaciyu po principam multiarendnosti Razrabotchiku raspredelyonnogo prilozheniya neobhodimo realizovat specialnyj klass upravleniya prilozheniem ApplicationMaster kotoryj otvechaet za koordinaciyu zadanij v ramkah teh resursov kotorye predostavit planirovshik resursov planirovshik resursov zhe otvechaet za sozdanie ekzemplyarov klassa upravleniya prilozheniem i vzaimodejstviya s nim cherez sootvetstvuyushij setevoj protokol YARN mozhet byt rassmotren kak klasternaya operacionnaya sistema v tom smysle chto vystupaet interfejsom mezhdu apparatnymi resursami klastera i shirokim klassom prilozhenij ispolzuyushih ego moshnosti dlya vypolneniya vychislitelnoj obrabotki Hadoop MapReduceHadoop MapReduce programmnyj karkas dlya programmirovaniya raspredelyonnyh vychislenij v ramkah paradigmy MapReduce Razrabotchiku prilozheniya dlya Hadoop MapReduce neobhodimo realizovat bazovyj obrabotchik kotoryj na kazhdom vychislitelnom uzle klastera obespechit preobrazovanie ishodnyh par klyuch znachenie v promezhutochnyj nabor par klyuch znachenie klass realizuyushij interfejs Mapper nazvan po funkcii vysshego poryadka Map i obrabotchik svodyashij promezhutochnyj nabor par v okonchatelnyj sokrashyonnyj nabor svyortku klass realizuyushij interfejs Reducer Karkas peredayot na vhod svyortki otsortirovannye vyvody ot bazovyh obrabotchikov svede nie sostoit iz tryoh faz shuffle tasovka vydelenie nuzhnoj sekcii vyvoda sort sortirovka gruppirovka po klyucham vyvodov ot raspredelitelej dosortirovka trebuyushayasya v sluchae kogda raznye atomarnye obrabotchiki vozvrashayut nabory s odinakovymi klyuchami pri etom pravila sortirovki na etoj faze mogut byt zadany programmno i ispolzovat kakie libo osobennosti vnutrennej struktury klyuchej i sobstvenno reduce svyortka spiska polucheniya rezultiruyushego nabora Dlya nekotoryh vidov obrabotki svyortka ne trebuetsya i karkas vozvrashaet v etom sluchae nabor otsortirovannyh par poluchennyh bazovymi obrabotchikami Hadoop MapReduce pozvolyaet sozdavat zadaniya kak s bazovymi obrabotchikami tak i so svyortkami napisannymi bez ispolzovaniya Java utility Hadoop streaming pozvolyayut ispolzovat v kachestve bazovyh obrabotchikov i svyortok lyuboj ispolnyaemyj fajl rabotayushij so standartnym vvodom vyvodom operacionnoj sistemy naprimer utility komandnoj obolochki UNIX est takzhe SWIG sovmestimyj prikladnoj interfejs programmirovaniya Hadoop pipes na C Takzhe v sostav distributivov Hadoop vhodyat realizacii razlichnyh konkretnyh bazovyh obrabotchikov i svyortok naibolee tipichno ispolzuemyh v raspredelyonnoj obrabotke V pervyh versiyah Hadoop MapReduce vklyuchal planirovshik zadanij JobTracker nachinaya s versii 2 0 eta funkciya perenesena v YARN i nachinaya s etoj versii modul Hadoop MapReduce realizovan poverh YARN Programmnye interfejsy po bolshej chasti sohraneny odnako polnoj obratnoj sovmestimosti net to est dlya zapuska programm napisannyh dlya predydushih versij API dlya raboty v YARN v obshem sluchae trebuetsya ih modifikaciya ili refaktoring i lish pri nekotoryh ogranicheniyah vozmozhny varianty obratnoj dvoichnoj sovmestimosti MasshtabiruemostOdnoj iz osnovnyh celej Hadoop iznachalno bylo obespechenie gorizontalnoj masshtabiruemosti klastera posredstvom dobavleniya nedorogih uzlov oborudovaniya massovogo klassa angl commodity hardware bez pribeganiya k moshnym serveram i dorogim setyam hraneniya dannyh Funkcioniruyushie klastery razmerom v tysyachi uzlov podtverzhdayut osushestvimost i ekonomicheskuyu effektivnost takih sistem tak po sostoyaniyu na 2011 god izvestno o krupnyh klasterah Hadoop v Yahoo bolee 4 tys uzlov s summarnoj yomkostyu hraneniya 15 Pbajt Facebook okolo 2 tys uzlov na 21 Pbajt i Ebay 700 uzlov na 16 Pbajt Tem ne menee schitaetsya chto gorizontalnaya masshtabiruemost v Hadoop sistemah ogranichena dlya Hadoop do versii 2 0 maksimalno vozmozhno ocenivalas v 4 tys uzlov pri ispolzovanii 10 MapReduce zadanij na uzel Vo mnogom etomu ogranicheniyu sposobstvovala koncentraciya v module MapReduce funkcij po kontrolyu za zhiznennym ciklom zadanij schitaetsya chto s vynosom eyo v modul YARN v Hadoop 2 0 i decentralizaciej raspredeleniem chasti funkcij po monitoringu na uzly obrabotki gorizontalnaya masshtabiruemost povysilas Eshyo odnim ogranicheniem Hadoop sistem yavlyaetsya razmer operativnoj pamyati na uzle imyon NameNode hranyashem vsyo prostranstvo imyon klastera dlya raspredeleniya obrabotki pritom obshee kolichestvo fajlov kotoroe sposoben obrabatyvat uzel imyon 100 mln Dlya preodoleniya etogo ogranicheniya vedutsya raboty po raspredeleniyu uzla imyon edinogo v tekushej arhitekture na ves klaster na neskolko nezavisimyh uzlov Drugim variantom preodoleniya etogo ogranicheniya yavlyaetsya ispolzovanie raspredelyonnyh SUBD poverh HDFS takih kak HBase rol fajlov i katalogov v kotoryh s tochki zreniya prilozheniya igrayut zapisi v odnoj bolshoj tablice bazy dannyh Po sostoyaniyu na 2011 god tipichnyj klaster stroilsya iz odnoprocessornyh mnogoyadernyh x86 64 uzlov pod upravleniem Linux s 3 12 diskovymi ustrojstvami hraneniya svyazannyh setyu s propusknoj sposobnostyu 1 Gbit s Sushestvuyut tendencii kak k snizheniyu vychislitelnoj moshnosti uzlov i ispolzovaniyu processorov s nizkim energopotrebleniem ARM Intel Atom tak i primeneniya vysokoproizvoditelnyh vychislitelnyh uzlov odnovremenno s setevymi resheniyami s vysokoj propusknoj sposobnostyu InfiniBand v angl vysokoproizvoditelnaya set hraneniya dannyh na Fibre Channel i Ethernet propusknoj sposobnostyu 10 Gbit s v shablonnyh konfiguraciyah FlexPod dlya bolshih dannyh Masshtabiruemost Hadoop sistem v znachitelnoj stepeni zavisit ot harakteristik obrabatyvaemyh dannyh prezhde vsego ih vnutrennej struktury i osobennostej po izvlecheniyu iz nih neobhodimoj informacii i slozhnosti zadachi po obrabotke kotorye v svoyu ochered diktuyut organizaciyu ciklov obrabotki vychislitelnuyu intensivnost atomarnyh operacij i v konechnom schyote uroven parallelizma i zagruzhennost klastera V rukovodstve Hadoop pervyh versij ranee 2 0 ukazyvalos chto priemlemym urovnem parallelizma yavlyaetsya ispolzovanie 10 100 ekzemplyarov bazovyh obrabotchikov na uzel klastera a dlya zadach ne trebuyushih znachitelnyh zatrat processornogo vremeni do 300 dlya svyortok schitalos optimalnym ispolzovanie ih po kolichestvu uzlov umnozhennomu na koefficient iz diapazona ot 0 95 do 1 75 i konstantu mapred tasktracker reduce tasks maximum S bo lshim znacheniem koefficienta naibolee bystrye uzly zakonchiv pervyj raund svedeniya ranshe poluchat vtoruyu porciyu promezhutochnyh par dlya obrabotki takim obrazom uvelichenie koefficienta izbytochno zagruzhaet klaster no pri etom obespechivaet bolee effektivnuyu balansirovku nagruzki V YARN vmesto etogo ispolzuyutsya konfiguracionnye konstanty opredelyayushie znacheniya dostupnoj operativnoj pamyati i virtualnyh processornyh yader dostupnyh dlya planirovshika resursov na osnovanii kotoryh i opredelyaetsya uroven parallelizma EkosistemaEtot razdel stati eshyo ne napisan Zdes mozhet raspolagatsya otdelnyj razdel Pomogite Vikipedii napisav ego 31 yanvarya 2017 KommercializaciyaNa fone populyarizacii Hadoop v 2008 godu i soobsheniyah o postroenii Hadoop klasterov v Yahoo i Facebook v oktyabre 2008 goda byla sozdana kompaniya Cloudera vo glave s Majklom Olsonom byvshim generalnym direktorom firmy sozdatelya Berkeley DB celikom nacelennaya na kommercializaciyu Hadoop tehnologij V sentyabre 2009 goda v Cloudera iz Yahoo pereshyol osnovnoj razrabotchik Hadoop Dug Katting i blagodarya takomu perehodu kommentatory oharakterizovali Cloudera kak novogo znamenosca Hadoop nesmotrya na to chto osnovnaya chast proekta byla sozdana vsyo taki sotrudnikami Facebook i Yahoo V 2009 godu osnovana kompaniya angl postavivshaya celyu sozdat vysokoproizvoditelnyj variant distributiva Hadoop i postavlyat ego kak sobstvennicheskoe programmnoe obespechenie V aprele 2009 goda Amazon zapustil oblachnyj servis Elastic MapReduce predostavlyayushij podpischikam vozmozhnost sozdavat klastery Hadoop i vypolnyat na nih zadaniya s povremennoj oplatoj Pozdnee v kachestve alternativy podpischiki Amazon Elastic MapReduce poluchili vybor mezhdu klassicheskim distributivom ot Apache i distributivami ot MapR V 2011 godu Yahoo vydelila podrazdelenie zanimavsheesya razrabotkoj i ispolzovaniem Hadoop v samostoyatelnuyu kompaniyu angl vskore novoj kompanii udalos zaklyuchit soglashenie s Microsoft o sovmestnoj razrabotke distributiva Hadoop dlya Windows Azure i Windows Server V tom zhe godu so stanovleniem predstavlenij o Hadoop kak odnoj iz bazovyh tehnologij bolshih dannyh fakticheski vse krupnye proizvoditeli tehnologicheskogo programmnogo obespecheniya dlya organizacij v tom ili inom vide vklyuchili Hadoop tehnologii v strategii i produktovye linejki Tak Oracle vypustila apparatno programmnyj kompleks angl zaranee sobrannyj v telekommunikacionnom shkafu i predkonfigurirovannyj Hadoop klaster s distributivom ot Cloudera IBM na osnove distributiva Apache sozdala produkt BigInsights EMC licenzirovala u MapR ih vysokoproizvoditelnyj Hadoop dlya integracii v produkty nezadolgo do etogo pogloshyonnoj Greenplum pozdnee eto biznes podrazdelenie bylo vydeleno v samostoyatelnuyu kompaniyu angl i ona pereshla na polnostyu samostoyatelnyj distributiv Hadoop na baze koda Apache Teradata zaklyuchila soglashenie s Hortonworks po integracii Hadoop v apparatno programmnyj kompleks massovo parallelnoj obrabotki Aster Big Analytics appliance V 2013 godu sobstvennyj distributiv Hadoop sozdala Intel god spustya otkazavshis ot ego razvitiya v polzu reshenij ot Cloudera v kotoroj priobrela dolyu v 18 Obyom rynka programmnogo obespecheniya i uslug vokrug ekosistemy Hadoop na 2012 god ocenyon v razmere 540 mln s prognozom rosta k 2017 godu do 1 6 mlrd lidery rynka kalifornijskie startapy Cloudera MapR i Hortonworks Krome nih otmecheny takzhe kompanii Hadapt pogloshena v iyule 2014 korporaciej Teradata angl Karmasphere i Platfora kak stroyashie celikom svoj biznes na sozdanii produktov dlya obespecheniya Hadoop sistem analiticheskimi vozmozhnostyami KritikaEtot razdel stati eshyo ne napisan Zdes mozhet raspolagatsya otdelnyj razdel Pomogite Vikipedii napisav ego 31 yanvarya 2017 Primechaniyahttps archive apache org dist hadoop common Release 3 4 0 available Vens 2009 It controls the top search engines and determines the ads displayed next to the results It decides what people see on Yahoo s homepage and finds long lost friends on Facebook Dean Jeffrey and Ghemawat Sanjay MapReduce Simplified Data Processing on Large Clusters angl OSDI 04 6th Symposium on Operating Systems Design and Implementation USENIX 2004 P 137 149 doi 10 1145 1327452 1327492 14 dekabrya 2011 goda Cutting Doug Hadoop a brief history angl Yahoo 24 marta 2008 Data obrasheniya 25 dekabrya 2011 Arhivirovano iz originala 11 marta 2012 goda Vens 2009 Doug Cutting with the stuffed elephant that inspired the name Hadoop Uajt 2013 In April 2008 Hadoop broke a world record to become the fastest systems to sort a terabyte of data Running on a 910 node cluster Hadoop sorted one terabyte in 209 seconds pp 10 11 Uajt 2013 by this time Hadoop was being used by many other companies besides Yahoo such as Last fm Facebook and the New York Times p 10 Metz Cade Google blesses Hadoop with MapReduce patent license angl The Register 27 aprelya 2010 Data obrasheniya 30 dekabrya 2011 Arhivirovano iz originala 11 marta 2012 goda Mec 2011 But it was very obvious very quickly that being able to manage Big Data is the biggest problem that CIOs have to solve It was clear that Hadoop was the way they wanted to solve the problem Morrison Alan i dr Bolshie Dannye kak izvlech iz nih informaciyu rus Tehnologicheskij prognoz Ezhekvartalnyj zhurnal rossijskoe izdanie 2010 vypusk 3 PricewaterhouseCoopers 17 dekabrya 2010 K nachalu 2010 goda Hadoop MapReduce i associirovannye s nimi tehnologii s otkrytym kodom stali dvizhushej siloj celogo novogo yavleniya kotoroe O Reilly Media The Economist i drugie izdaniya okrestili bolshimi dannymi Data obrasheniya 12 noyabrya 2011 Arhivirovano iz originala 11 marta 2012 goda Winckler Marie Apache Hadoop takes top prize at Media Guardian Innovation Awards angl The Guardian 25 marta 2011 Described by the judging panel as a Swiss army knife of the 21st century Apache Hadoop picked up the innovator of the year award for having the potential to change the face of media innovations Data obrasheniya 25 dekabrya 2011 Arhivirovano iz originala 11 marta 2012 goda Serdar Yegulalp Hadoop 2 Big data s big leap forward Hadoop 2 0 goes beyond MapReduce to create a general framework for distributed data processing applications angl Infoworld IDG 16 oktyabrya 2013 Data obrasheniya 1 yanvarya 2014 16 dekabrya 2013 goda Toby Wolpe Hortonworks founder YARN is Hadoop s datacentre OS As lead on MapReduce and part of Hadoop from its inception Arun Murthy offers his take on YARN s importance to the open source project and enterprise data architecture angl ZDNet 31 oktyabrya 2013 It was the system to take the application from the user and run it So it s sort of the operating system Data obrasheniya 1 yanvarya 2014 2 yanvarya 2014 goda Apache Hadoop MapReduce Migrating from Apache Hadoop 1 x to Apache Hadoop 2 x neopr Apache Software Foundation 7 oktyabrya 2013 Data obrasheniya 1 yanvarya 2014 2 yanvarya 2014 goda Shvachko 2011 Originalnyj tekst angl Yahoo reportedly ran numerous clusters having 4000 nodes with four 1 TB drives per node 15 PB of total storage capacity 70 million files and 80 million blocks using 50 GB NameNode heap Facebook s 2000 node warehouse cluster is provisioned for 21 PB of total storage capacity Extrapolating the announced growth rate its namespace should have close to 200 million objects files blocks by now but an immense 108 GB heap should allow room for close to 400 million objects eBay runs a 700 node cluster Each node has 24 TB of local disk storage 72 GB of RAM and a 12 core CPU Total cluster size is 16 PB Shvachko 2011 The Apache Hadoop MapReduce framework has reportedly reached its scalability limit at 40 000 clients simultaneously running on the cluster This corresponds to a 4 000 node cluster with 10 MapReduce clients slots in Hadoop terminology per node Shvachko 2011 In order to process metadata requests from thousands of clients efficiently NameNode keeps the entire namespace in memory The amount of RAM allocated for the NameNode limits the size of the cluster lt gt The current namespace limit is 100 million files Dereck Harris Big data on micro servers You bet Online dating service eHarmony is using SeaMicro s specialized Intel Atom powered servers as the foundation of its Hadoop infrastructure demonstrating that big data applications such as Hadoop might be a killer app for low powered micro servers angl 13 iyunya 2011 Data obrasheniya 4 yanvarya 2014 22 dekabrya 2013 goda yarn nodemanager resource memory mb i yarn nodemanager resource cpu vcores sootvetstvenno v fajle konfigracii YARN Handy Alex Hadoop creator goes to Cloudera angl 9 oktyabrya 2009 I ve said for a while now that Cloudera is the company with the Hadoop banner firmly in its grasp despite the fact that Yahoo and Facebook both contribute mountains of code the project Data obrasheniya 25 dekabrya 2011 Arhivirovano iz originala 11 marta 2012 goda Mary Jo Foley Hortonworks delivers beta of Hadoop big data platform for Windows A fully open source version of Hortonworks Data Platform for Windows built with contributions from Microsoft is available to beta testers angl ZDNet 17 fevralya 2013 In 2011 Microsoft announced it was partnering with Hortonworks to create both a Windows Azure and Windows Server implementations of the Hadoop big data framework Data obrasheniya 2 yanvarya 2014 3 yanvarya 2014 goda Timothy Prickett Morgan Oracle rolls its own NoSQL and Hadoop A supremely confident Ellison mounts the Big Data elephant angl The Register 3 oktyabrya 2011 There s no shortage of ego at Oracle as evidenced by the effusion of confidence behind the company s OpenWorld announcement of the not so humbly named Big Data Appliance Data obrasheniya 2 yanvarya 2014 7 iyulya 2017 goda Doug Henschen IBM Beats Oracle Microsoft With Big Data Leap Hadoop based InfoSphere BigInsights platform goes live on SmartCloud infrastructure beating Oracle and Microsoft to market angl Information Week 2011 14 10 Data obrasheniya 2 yanvarya 2014 3 yanvarya 2014 goda Dereck Harris Startup MapR Underpins EMC s Hadoop Effort Calif based storage startup MapR which provides a high performance alternative for the Hadoop Distributed File System will serve as the storage component for EMC s forthcoming Greenplum HD Enterprise Edition Hadoop distribution angl Gigaom 25 maya 2011 Data obrasheniya 2 yanvarya 2014 2 yanvarya 2014 goda Timoty Pricket Morgan Pivotal ships eponymous Hadoop distro to the masses An inquisitive HAWQ rides the big data elephant angl The Register 17 iyulya 2013 In the wake of its acquiring the Greenplum parallel database and related data warehouse appliance business a few years back EMC hooked up with MapR Technologies to rebrand its own rendition of Hadoop to make its Greenplum HD variant But with the Pivotal HD 1 0 distribution the EMC and VMware spinoff has gone back to the open source Apache Hadoop Data obrasheniya 2 yanvarya 2014 3 yanvarya 2014 goda Jaikumar Vijayan Teradata partners with Hortonworks on Hadoop Two companies join to develop products and implementation services angl Computerworld 21 fevralya 2012 Data obrasheniya 2 yanvarya 2014 3 yanvarya 2014 goda Stacey Higginbotham Cloudera who Intel announces its own Hadoop distribution Intel s getting into the open source software business with it s own version of Hadoop It joins a host of startups as well as EMC Greenplum in building a distribution for big data angl Gigaom 26 fevralya 2013 Data obrasheniya 3 yanvarya 2014 2 yanvarya 2014 goda Harris Dereck Intel jettisons its Hadoop distro and puts millions behind Cloudera angl Gigaom 27 marta 2014 Data obrasheniya 1 aprelya 2014 30 marta 2014 goda John Furrier Hadoop Pure Play Business Models Explained angl Forbes 19 dekabrya 2013 Data obrasheniya 2 yanvarya 2014 3 yanvarya 2014 goda Doug Henschen Teradata Acquires Hadapt Revelytix For Big Data Boost Teradata adds data prep data management and data analysis capabilities by buying two notable independents in the big data arena neopr Information Week 22 iyulya 2014 Data obrasheniya 1 noyabrya 2014 1 noyabrya 2014 goda Doug Henschen 13 Big Data Vendors To Watch In 2013 From Amazon to Splunk here s a look at the big data innovators that are now pushing Hadoop NoSQL and big data analytics to the next level neopr Information Week 10 dekabrya 2012 Hadapt Brings Relational Analytics To Hadoop lt gt Hadapt is in good company with Cloudera Impala Datameer Karmasphere Platfora and others all working on various ways to meet the same analytics on Hadoop challenge Data obrasheniya 2 yanvarya 2014 3 yanvarya 2014 goda LiteraturaUajt Tom Hadoop Podrobnoe rukovodstvo 2 e SPb Piter 2013 672 s 1000 ekz ISBN 978 5 496 00662 0 Lem Chak Hadoop v dejstvii DMK Press 2012 424 s 500 ekz ISBN 978 5 97060 156 3 978 5 94074 785 7 Vance Ashlee 2009 03 17 Hadoop a Free Software Program Finds Uses Beyond Search HTML angl N Y The New York Times pp B3 Data obrasheniya 25 dekabrya 2011 Prickett Morgan Timothy Cloudera floats commercial Hadoop distro angl The Register 16 marta 2009 Data obrasheniya 25 dekabrya 2011 Arhivirovano iz originala 11 marta 2012 goda Metz Cade How Yahoo Spawned Hadoop the Future of Big Data angl Wired 18 oktyabrya 2011 Data obrasheniya 25 dekabrya 2011 Arhivirovano iz originala 11 marta 2012 goda Shvachko Konstantin Apache Hadoop The Scalability Update angl 2011 Vol 36 no 3 P 7 13 ISSN 1044 6397 Ssylkihadoop apache org oficialnyj sajt HadoopInformaciya v etoj state ili nekotoryh eyo razdelah ustarela Vy mozhete pomoch proektu obnoviv eyo i ubrav posle etogo dannyj shablon
Вершина