WordCount Case practice
1. Local testing
1) demand
In the given text file statistics output the total number of times each word appears
(1) input data
ss ss
cls cls
jiao
banzhang
xue
hadoop
(2) Expected output data
banzhang 1
cls 2
hadoop 1
jiao 1
ss 2
xue 1
2) Demand analysis
according to MapReduce Programming specification , Write separately Mapper,Reducer,Driver.
demand : Count the number of words in a pile of files (WordCount Case study )
3) Environmental preparation
(1) establish maven engineering ,MapReduceDemo
(2) stay pom.xml
Add the following dependencies to the file
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.1.3</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.30</version>
</dependency>
</dependencies>
(2) In the project src/main/resources
Under the table of contents , Create a new file , Name it “log4j.properties
”, Fill in the file with :
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
(3) Create a package name :com.zs.mapreduce.wordcount
4) Programming
(1) To write Mapper class
package com.zs.mapreduce.wordcount;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper; // 2.x 3.x,mapred:1.x
import java.io.IOException;
/** * KEYIN,map Phase input key The type of :LongWritable * VALUEIN,map Stage input value type :Text * KEYOUT,map Stage output Key type :Text * VALUEOUT,map Stage output value type :IntWritable */
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
// Define global variables , Save resources
private Text outK = new Text();
private IntWritable outV = new IntWritable(1);
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// super.map(key, value, context);
// 1. Get a row
String line = value.toString();
// 2. cutting
String[] words = line.split(" ");
// 3. Cycle writing
for (String word : words) {
// encapsulation outK
outK.set(word);
// Write
context.write(outK, outV);
}
}
}
(2) To write Reducer class
package com.zs.mapreduce.wordcount;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
/** * KEYIN,reduce:LongWritable * VALUEIN,reduce Stage input value type :Text * KEYOUT,reduce Stage output Key type :Text * VALUEOUT,reduce Stage output value type :IntWritable */
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable outV = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
// Add up
for (IntWritable value : values) {
sum += value.get();
}
outV.set(sum);
// Write
context.write(key, outV);
}
}
(3) To write Driver Drive class
package com.zs.mapreduce.wordcount;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class WordCountDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
// 1. obtain job
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
// 2. Set up jar Package path
job.setJarByClass(WordCountDriver.class);
// 3. relation mapper and reducer
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
// 4. Set up map Output kv type
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// 5. Set the final output kV type
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 6. Set input path and output path
FileInputFormat.setInputPaths(job, new Path("D:\\software\\hadoop\\input\\inputword"));
FileOutputFormat.setOutputPath(job, new Path("D:\\software\\hadoop\\output\\output1"));
// 7. Submit job
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);
}
}
5) Local testing
(1) It needs to be configured first HADOOP_HOME Variables and Windows Operational dependency
(2) stay IDEA/Eclipse Run the program on
2. Submit to cluster test
Test on the cluster
(1) use maven hit jar package , The packaged plug-ins that need to be added depend on
<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
Be careful : If the project shows a Red Cross . On the project Right click ->maven->Reimport
Refresh it .
(2) Type the program as jar package
(3) Modify without dependencies jar
Package name for wc.jar
, And copy the jar Package to Hadoop Clustered /opt/module/hadoop-3.1.3
route .
Use XShell Transferred to the Linux Can be !
(4) start-up Hadoop colony
[zs@hadoop102 hadoop-3.1.3]sbin/start-dfs.sh
[zs@hadoop103 hadoop-3.1.3]$ sbin/start-yarn.sh
(5) perform WordCount Program
[zs@hadoop102 hadoop-3.1.3]$ hadoop jar wc.jar com.zs.mapreduce.wordcount.WordCountDriver /user/zs/input /user/zs/output
come on. !
thank !
Strive !