- Build this docker image
- Run
bashas the command with an iteractive tty to get into the image:
docker run --rm -it ${whatever-you-named-the-image} /bin/bash
- The data is in the directory
/root/dataon said image - Create a Pull Request with your code for review
You're free to use whatever language you want just as long as you include the instructions on how to run your code. (Bonus points if you modify the
Dockerfileinstead)Note that you do not have to use a Big Data stack like Hadoop or Spark. If you do use those, provide either a docker-swarm or kubernetes configuration file(s) in your Pull Request that will setup the cluster or else we won't be able to run the code
output should be a simple number
sample output
5
create a csv file that shows the word count of every value of every dataset (dataset being a .csv file)
output should be a csv file that has a header row with fields value and
count and one entry for every value found:
sample output
value,count
some value,435
another value,234
word,45
...
output should be a simple number
sample output
1000000000