国产精品毛片一级久久,一区二区国产精品欧美日韩

第二章——hive入門教程之word count

如果使用java編寫mapreduce程序?qū)崿F(xiàn)wordcount也很簡單，如下代碼就實(shí)現(xiàn)了一個(gè)簡單的hello world程序：word count。需要的pom.xml依賴

<!-- 版本信息 -->
    <properties>
        <log4j.version>2.5</log4j.version>
        <hadoop.version>2.7.2</hadoop.version>
        <scopeType>provided</scopeType>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.ikeguang</groupId>
            <artifactId>common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>{hadoop.version}</version>
            <scope>{scopeType}</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>{hadoop.version}</version>
            <scope>{scopeType}</scope>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-core -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>{hadoop.version}</version>
            <scope>{scopeType}</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-common</artifactId>
            <version>{hadoop.version}</version>
            <scope>{scopeType}</scope>
        </dependency>
    </dependencies>

代碼

1)、WordCountMapper.java程序：

package org.ikeguang.hadoop.mapreduce.wordcount;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/**
 * Created by keguang on 2019-12-07.
 */
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String[] words = value.toString().split(" ");
        for(String word : words){
            context.write(new Text(word), new IntWritable(1));
        }
    }
}

2)、WordCountReducer.java程序：

package org.ikeguang.hadoop.mapreduce.wordcount;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

/**
 * Created by keguang on 2019-12-07.
 */
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int sum = 0;
        for(IntWritable val : values){
            sum = sum + val.get();
        }

        context.write(key, new IntWritable(sum));
    }
}

3)、WordCountDriver.java程序:

package org.ikeguang.hadoop.mapreduce.wordcount;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.CombineTextInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.ikeguang.hadoop.util.HdfsUtil;

/**
 * Created by keguang on 2019-12-07.
 */
public class WordCountDriver extends Configured implements Tool{

    public static void main(String[] args) throws Exception {
        int ec = ToolRunner.run(new Configuration(),new WordCountDriver(),args);
        System.exit(ec);
    }

    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance();
        job.setJobName("wordcount");

        job.setJarByClass(WordCountDriver.class);
        job.setMapperClass(WordCountMapper.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setReducerClass(WordCountReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        // 輸入輸出路徑
        String inpath = args[0];
        String output_path = args[1];

        FileInputFormat.addInputPath(job, new Path(inpath));

        if(HdfsUtil.existsFiles(conf,output_path)){
            HdfsUtil.deleteFolder(conf,output_path);
        }

        // 輸入路徑可以遞歸
        FileInputFormat.setInputDirRecursive(job,true);

        // 輸入數(shù)據(jù)小文件合并
        job.setInputFormatClass(CombineTextInputFormat.class);

        // 一個(gè)map最少處理128M文件
        CombineTextInputFormat.setMinInputSplitSize(job,134217728);
        // 最多處理256M文件
        CombineTextInputFormat.setMaxInputSplitSize(job,new Long(268435456));

        // job.setNumReduceTasks(10);

        // 輸出路徑
        FileOutputFormat.setOutputPath(job,new Path(output_path));

        return job.waitForCompletion(true)?0:1;
    }
}

統(tǒng)計(jì)英文的單詞數(shù)，啟動(dòng)程序的命令是：

hadoop jar hadoop-1.0-SNAPSHOT.jar org.ikeguang.hadoop.mapreduce.wordcount.WordCountDriver /data/wordcount/input /data/wordcount/output

hadoop-1.0-SNAPSHOT.jar：最終的jar包名字;
org.ikeguang.hadoop.mapreduce.wordcount.WordCountDriver：java程序主類(入口);
data/wordcount/input：hdfs數(shù)據(jù)輸入目錄;
/data/wordcount/output：hdfs數(shù)據(jù)輸出目錄;

結(jié)果：

Bingley 3
But 2
England;    1
Her 1
However 1
I   15
IT  1
Indeed  1
Jane,   1
Lady    1
Lizzy   2

但是需要寫代碼程序，終歸是有門檻的，如果寫hive sql簡稱HQL的話，只需要這樣：

select word, count(1) from table group by word;

注：假設(shè)這里的word列存放單個(gè)詞。
只要會(huì)sql基本語法，每個(gè)人都能上手hive進(jìn)行大數(shù)據(jù)分析，其功德是無量的。

作者：柯廣的網(wǎng)絡(luò)日志

微信公眾號(hào)：Java大數(shù)據(jù)與數(shù)據(jù)倉庫

在线午夜精品自拍小视频_无码av无码专区线_亚洲无码精品人妻_人人澡欧美一区

大數(shù)據(jù)

代碼