이제 멀티 노드 클러스터를 만들어야 한다.
멀티! 쿠버네티스? docker compose? 쿠버네티스 올리면 yaml 쓸거 엄청 많아질텐데..
하지만 kubectl apply 딸깍은 못참겠다. 집에서도 주말에도 조금씩 더 해봐야겠다.
docker network create hadoop_network
# volume 생성
docker volume create namenode_volume
docker volume create datanode1_volume
docker volume create datanode2_volume
# hadoop cluster 실행
docker-compose up -d
docker-compose logs
hadoop-namenode | Starting NameNode
hadoop-namenode | ERROR: HADOOP_CONF_DIR environment variable is not set.
hadoop-datanode1 | Starting NameNode
hadoop-datanode1 | ERROR: HADOOP_CONF_DIR environment variable is not set.
hadoop-datanode2 | Starting NameNode
hadoop-datanode2 | ERROR: HADOOP_CONF_DIR environment variable is not set.
hadoop-resourcemanager | Starting NameNode
hadoop-resourcemanager | ERROR: HADOOP_CONF_DIR environment variable is not set.
docker-compose에서 environments로 각각 추가시켜줬다.
jsvc 설치로 해결
컨테이너에 22번 포트를 바인딩해주지 않으면 ssh 연결마저도 할 수 없게 된다. docker-compose 파일에 22번 포트 매핑한다고 적어주자.
이제 README 쓰고 2b 미션으로 넘어가자
# mapreduce 작업으로 pi 근사값 구하기
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.1.jar pi 10 1000000
Number of Maps = 10
Samples per Map = 1000000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
2025-07-24 12:30:35,815 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at hadoop-resourcemanager/172.22.0.5:8032
2025-07-24 12:30:36,371 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1753359340183_0001
2025-07-24 12:30:36,495 INFO input.FileInputFormat: Total input files to process : 10
2025-07-24 12:30:36,566 INFO mapreduce.JobSubmitter: number of splits:10
2025-07-24 12:30:36,669 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1753359340183_0001
2025-07-24 12:30:36,669 INFO mapreduce.JobSubmitter: Executing with tokens: []
2025-07-24 12:30:36,799 INFO conf.Configuration: resource-types.xml not found
2025-07-24 12:30:36,800 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2025-07-24 12:30:37,393 INFO impl.YarnClientImpl: Submitted application application_1753359340183_0001
2025-07-24 12:30:37,445 INFO mapreduce.Job: The url to track the job: <http://hadoop-resourcemanager:8088/proxy/application_1753359340183_0001/>
2025-07-24 12:30:37,447 INFO mapreduce.Job: Running job: job_1753359340183_0001
2025-07-24 12:30:49,058 INFO mapreduce.Job: Job job_1753359340183_0001 running in uber mode : false
2025-07-24 12:30:49,059 INFO mapreduce.Job: map 0% reduce 0%
2025-07-24 12:30:54,169 INFO mapreduce.Job: map 20% reduce 0%
2025-07-24 12:31:00,207 INFO mapreduce.Job: map 100% reduce 0%
2025-07-24 12:31:02,213 INFO mapreduce.Job: map 100% reduce 100%
2025-07-24 12:31:02,218 INFO mapreduce.Job: Job job_1753359340183_0001 completed successfully
2025-07-24 12:31:02,313 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=226
FILE: Number of bytes written=3403301
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2700
HDFS: Number of bytes written=215
HDFS: Number of read operations=45
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=71495
Total time spent by all reduces in occupied slots (ms)=5058
Total time spent by all map tasks (ms)=71495
Total time spent by all reduce tasks (ms)=5058
Total vcore-milliseconds taken by all map tasks=71495
Total vcore-milliseconds taken by all reduce tasks=5058
Total megabyte-milliseconds taken by all map tasks=73210880
Total megabyte-milliseconds taken by all reduce tasks=5179392
Map-Reduce Framework
Map input records=10
Map output records=20
Map output bytes=180
Map output materialized bytes=280
Input split bytes=1520
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=280
Reduce input records=20
Reduce output records=0
Spilled Records=40
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=2736
CPU time spent (ms)=7560
Physical memory (bytes) snapshot=2712104960
Virtual memory (bytes) snapshot=28513050624
Total committed heap usage (bytes)=2173698048
Peak Map Physical memory (bytes)=287764480
Peak Map Virtual memory (bytes)=2593390592
Peak Reduce Physical memory (bytes)=211243008
Peak Reduce Virtual memory (bytes)=2595799040
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1180
File Output Format Counters
Bytes Written=97
Job Finished in 24.047 seconds
Estimated value of Pi is 3.14158440000000000000
docker kill hadoop-namenode hadoop-datanode1 hadoop-datanode2 hadoop-resourcemanager
docker rm hadoop-namenode hadoop-datanode1 hadoop-datanode2 hadoop-resourcemanager
docker volume rm namenode_volume
docker volume rm datanode1_volume
docker volume rm datanode2_volume
docker volume rm datanode3_volume
docker volume create namenode_volume
docker volume create datanode1_volume
docker volume create datanode2_volume
docker volume create datanode3_volume