subject

Modify the WordCount program so it outputs the wordcount for each distinct word in each file. So the output of this DocWordCount program should be of the form ‘wordfilename count’, where ‘’ serves as a delimiter between word and filename and tab serves as a delimiter between filename and count. Submit your source code in a file named DocWordCount. java.

Explanation: Consider two simple files file1.txt and file2.txt. $ echo "Hadoop is yellow Hadoop" > file1.txt $ echo "yellow Hadoop is an elephant" > file2.txt Running ‘DocWordCount. java’ on these two files will give an output similar to that below, where is a delimiter.

Output of DocWordCount. java

yellowfile2.txt 1

Hadoopfile2.txt 1

isfile2.txt 1

elephantfile2.txt 1

yellowfile1.txt 1

Hadoopfile1.txt 2

isfile1.txt 1

anfile2.txt 1

Initial code that needs to be modified:

package org. myorg;

import java. io. IOException;
import java. util. regex. Pattern;
import org. apache. hadoop. conf. Configured;
import org. apache. hadoop. util. Tool;
import org. apache. hadoop. util. ToolRunner;
import org. apache. log4j. Logger;
import org. apache. hadoop. mapreduce. Job;
import org. apache. hadoop. mapreduce. Mapper;
import org. apache. hadoop. mapreduce. Reducer;
import org. apache. hadoop. fs. Path;
import org. apache. hadoop. mapreduce. lib. input. FileInputFormat;
import org. apache. hadoop. mapreduce. lib. output. FileOutputFormat;
import org. apache. hadoop. io. IntWritable;
import org. apache. hadoop. io. LongWritable;
import org. apache. hadoop. io. Text;

public class WordCount extends Configured implements Tool {

private static final Logger LOG = Logger .getLogger( WordCount. class);

public static void main( String[] args) throws Exception {
int res = ToolRunner .run( new WordCount(), args);
System .exit(res);
}

public int run( String[] args) throws Exception {
Job job = Job .getInstance(getConf(), " wordcount ");
job. setJarByClass( this .getClass());

FileInputFormat. addInputPaths(job, args[0]);
FileOutputFormat. setOutputPath(job, new Path(args[ 1]));
job. setMapperClass( Map .class);
job. setReducerClass( Reduce .class);
job. setOutputKeyClass( Text .class);
job. setOutputValueClass( IntWritable .class);

return job. waitForCompletion( true) ? 0 : 1;
}

public static class Map extends Mapper {
private final static IntWritable one = new IntWritable( 1);
private Text word = new Text();

private static final Pattern WORD_BOUNDARY = Pattern .compile("\\s*\\b\\s*");

public void map( LongWritable offset, Text lineText, Context context)
throws IOException, InterruptedException {

String line = lineText. toString();
Text currentWord = new Text();

for ( String word : WORD_BOUNDARY .split(line)) {
if (word. isEmpty()) {
continue;
}
currentWord = new Text(word);
context. write(currentWord, one);
}
}
}

public static class Reduce extends Reducer {
@Override
public void reduce( Text word, Iterable counts, Context context)
throws IOException, InterruptedException {
int sum = 0;
for ( IntWritable count : counts) {
sum += count. get();
}
context. write(word, new IntWritable(sum));
}
}
}

ansver
Answers: 2

Another question on Computers and Technology

question
Computers and Technology, 22.06.2019 12:00
Which of the following “invisible” marks represents an inserted tab?
Answers: 1
question
Computers and Technology, 22.06.2019 16:20
Consider the following statements, then select one of the answers below: the signal() function shown below registers "sig_handler()" as the signal handler function for the sigkill signal, without the complexity of using when the sigkill signal is sent to a process running this code, by a user typing "kill -kill ", where the correct process id is used for to target the process, sig_handler() will be executed.
Answers: 1
question
Computers and Technology, 23.06.2019 00:00
Which is the correct sequence of steps to set up a document in landscape orientation? a. select page setup from the file menu. then click the margins tab and select landscape. b. select page setup from the edit menu. then click the margins tab and select landscape. c. select page setup from the insert menu. then click the margins tab and select landscape. d. select page setup from the format menu. then click the margins tab and select landscape
Answers: 1
question
Computers and Technology, 23.06.2019 09:00
Design a class tictactoe that: holds the following information about the game: two-dimensional array (3 by 3), and winner. add additional variables as needed. includes the functions to perform the various operations on objects. for example, function to print the board, getting the move, checking if move is valid, determining if there is a winner after each move. add additional operations as needed. includes constructor(s). write the functions of the class, and write a program that uses the class. the program should declare an object of type tictactoe. the program will create the board and store it in the array. the program will allow two players to play the tic-tac-toe game. after every valid move update the array, check if there is a winner. if there is no winner and no tie, then print the board again to continue.
Answers: 2
You know the right answer?
Modify the WordCount program so it outputs the wordcount for each distinct word in each file. So the...
Questions
question
Mathematics, 25.10.2020 21:50
question
Mathematics, 25.10.2020 21:50
question
French, 25.10.2020 21:50
question
Mathematics, 25.10.2020 21:50
question
Physics, 25.10.2020 21:50
question
Computers and Technology, 25.10.2020 21:50
Questions on the website: 13722363