This is a simple set of bash functions for manipulating a Amazon Elastic MapReduce clusters.
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
You must install the AWS Command Line Interface.
You then must setup a credentials file with your default emrsettings and configure your EMR_HOME to the directory that hosts that ruby client install root directory.
export EMR_HOME=/path/to/credentialsFileFolder
Finally, you must source the setenv.sh file
. setenv.sh
Setting EMR_CRED_JSON will allow you to override the credentials.json file .
To find an existing cluster:
emrlist
To attach to a cluster, using a flow id:
emrset <flow id>
To get the current flow id:
emrset
To remotely login to the master node of the current flow id:
emrlogin
To remotely login with just the ip address:
emrlogin <ip address>
Note that most commands will take the flow id or an ip address to override the default flow id set using emrset.
This is shorthand for calling from the shell.
emr <some args>
When you start a flow on EMR, you will be given a flow id. Use emrset to set the flow id for use by many of the other commands
emrset <flow id>
Calling emrset without the id returns the current flow id.
Will return all job flows created in the last 2 days
Will return the current master node on the EMR cluster.
Will remotely login to the master node.
Will return the current status of a given running flow.
Will terminate your remote EMR cluster.
Will launch screen on the master node. Screen must be already installed. If a screen instance is already running, this command will automatically attach.
Will automatically 'tail' the current flow step logs.
emrtail 2
Without a step number, a list of available steps will be displayed.
Will create a local SOCKS proxy to the master node. This is useful for accessing the JobTracker and NameNode. You must install FoxyProxy in FireFox for this to work best.
Will scp a given file to the remote master node.
emrscp my-hadoop-app.jar
This is useful if you leave your EMR cluster running and want to manually spawn jobs from emrlogin or emrscreen.
Will scp all conf/*-site.xml files from the master node into the given directory.
emrconf local-conf
This is useful if you leave your EMR cluster running on a AWS VPC and wish to run Hadoop jobs from a local shell.