hadoop在eclipse當中如何添加源碼
/*org.apache.hadoop.mapreduce.Mapper.Context,java.lang.InterruptedException, 想看map的源代碼,按control,點擊,出現(xiàn)Attach Source Code,點擊External Location/External File,找到源代碼,就在Source目錄下,,D:\hadoop-2.7.4\src
其中key為此行的開頭相對于文件的起始位置,value就是此行的字符文本
(購買完整教程)
*/馬克- to-win:馬克 java社區(qū):防盜版實名手機尾號: 73203。
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
System.out.println("key is 馬克-to-win @ 馬克java社區(qū) "+key.toString()+" value is "+value.toString());
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
System.out.println("reduce key is 馬克-to-win @ 馬克java社區(qū) "+key.toString());
int sum = 0;
for (IntWritable val : values) {
int valValue=val.get();
System.out.println("valValue is"+valValue);
sum += valValue ;
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
// job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
/*非 0 的狀態(tài)碼表示異常終止。 */
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
在src目錄下,創(chuàng)建log4j.properties文件(這樣可以看到輸出反應)
#log4j.rootLogger=debug,stdout,R
log4j.rootLogger=info,stdout,R
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p - %m%n
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=mapreduce_test.log
log4j.appender.R.MaxFileSize=1MB
log4j.appender.R.MaxBackupIndex=1
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%
log4j.logger.com.codefutures=DEBUG
緊接著,配置運行參數(shù)
右擊項目名稱,選擇“run as”-“run configurations”,在main中browse和search成為如下圖。在“Arguments”里加入兩個參數(shù)(其中第二個參數(shù)hdfs://localhost:9000/output2必須不存在)
hdfs://localhost:9000/README.txt
hdfs://localhost:9000/output2
注意:馬克-to-win @ 馬克java社區(qū):通常來講,mapreduce輸入輸出都需要在一個hdfs分布式文件系統(tǒng),當然也可以是其他的文件系統(tǒng)。這樣運行作業(yè)的輸入文件必須首先上傳到hdfs,輸出文件自然也在hdfs上。mapreduce作業(yè)在開始時會對輸入文件進行切分,目的是方便多個map任務同時進行處理,提高處理效率,實現(xiàn)分布式計算。所以,我們編的程序邏輯對hadoop集群也適用。
然后點擊“Run”即可運行。
6)結果查看
打開新生成的part-r-00000文件:
a 2
hello 4
mark 1
to 1
win 1
里面給出了第一個參數(shù)中README.txt文件中每個字的出現(xiàn)次數(shù)。這正是WordCount要做的事情。只不過這里用的是MapReduce的并行開發(fā)方法。這正是我們想要的結果。
源文件:
hello a hello win
hello a to
hello mark
執(zhí)行結果是:
INFO - Initializing JVM Metrics with processName=JobTracker, sessionId=
INFO - Total input paths to process : 1
INFO - number of splits:1
INFO - Submitting tokens for job: job_local936781335_0001
INFO - Running job: job_local936781335_0001
INFO - Waiting for map tasks
INFO - Starting task: attempt_local936781335_0001_m_000000_0
INFO - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@1d5f80b
INFO - Processing split: hdfs://localhost:9000/README.txt:0+31
INFO - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
key is 馬克-to-win @ 馬克java社區(qū) 0 value is hello a hello win
key is 馬克-to-win @ 馬克java社區(qū) 19 value is hello a to
key is 馬克-to-win @ 馬克java社區(qū) 31 value is hello mark
INFO - Starting flush of map output
INFO - Spilling map output
INFO - Finished spill 0
INFO - Task:attempt_local936781335_0001_m_000000_0 is done. And is in the process of committing
INFO - map
INFO - Task 'attempt_local936781335_0001_m_000000_0' done.
INFO - Finishing task: attempt_local936781335_0001_m_000000_0
INFO - map task executor complete.
INFO - Waiting for reduce tasks
INFO - Starting task: attempt_local936781335_0001_r_000000_0
INFO - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@490476
INFO - org.apache.hadoop.mapreduce.task.reduce.Shuffle@e7707f
INFO - MergerManager: memoryLimit=181665792, maxSingleShuffleLimit=45416448, mergeThreshold=119899424, ioSortFactor=10, memToMemMergeOutputsThreshold=10
INFO - attempt_local936781335_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
INFO - localfetcher#1 about to shuffle output of map attempt_local936781335_0001_m_000000_0 decomp: 68 len: 72 to MEMORY
INFO - Read 68 bytes from map-output for attempt_local936781335_0001_m_000000_0
INFO - closeInMemoryFile -> map-output of size: 68, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->68
INFO - EventFetcher is interrupted.. Returning
INFO - 1 / 1 copied.
INFO - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
INFO - Merging 1 sorted segments
INFO - Down to the last merge-pass, with 1 segments left of total size: 60 bytes
INFO - Merged 1 segments, 68 bytes to disk to satisfy reduce memory limit
INFO - Merging 1 files, 72 bytes from disk
INFO - Merging 1 sorted segments
INFO - 1 / 1 copied.
reduce key is 馬克-to-win @ 馬克java社區(qū) a
valValue is1
valValue is1
reduce key is 馬克-to-win @ 馬克java社區(qū) hello
valValue is1
valValue is1
valValue is1
valValue is1
reduce key is 馬克-to-win @ 馬克java社區(qū) mark
valValue is1
reduce key is 馬克-to-win @ 馬克java社區(qū) to
valValue is1
reduce key is 馬克-to-win @ 馬克java社區(qū) win
valValue is1
INFO - map 100% reduce 0%
INFO - Task:attempt_local936781335_0001_r_000000_0 is done. And is in the process of committing
INFO - 1 / 1 copied.
INFO - Task attempt_local936781335_0001_r_000000_0 is allowed to commit now
INFO - Saved output of task 'attempt_local936781335_0001_r_000000_0' to hdfs://localhost:9000/output9/_temporary/0/task_local936781335_0001_r_000000
INFO - reduce > reduce
INFO - Task 'attempt_local936781335_0001_r_000000_0' done.
INFO - Finishing task: attempt_local936781335_0001_r_000000_0
INFO - reduce task executor complete.
INFO - map 100% reduce 100%
INFO - Job job_local936781335_0001 completed successfully
INFO - Counters: 35
File System Counters
FILE: Number of bytes read=486
FILE: Number of bytes written=599966
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=62
HDFS: Number of bytes written=26
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=3
Map output records=6
Map output bytes=54
Input split bytes=97
Combine input records=0
Combine output records=0
Reduce input groups=4
Reduce shuffle bytes=72
Reduce input records=6
Reduce output records=4
Spilled Records=12
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=46
Total committed heap usage (bytes)=243499008
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=31
File Output Format Counters
Bytes Written=26