This project will include some Spark simple exercises based in the airline dataset.
The input format used in the exercises will be a extract for the Flights Dataset.
The file will be separated by commas and each line will have the following structure:
| Field Name | Description | Field Type |
|---|---|---|
| Year | year | Int |
| Month | month | Int |
| DayofMonth | day of month | Int |
| DayOfWeek | day of week | Int |
| DepTime | flight's departure time using the format: [h]hmm being: h-> hour, m-> minute and [] optional int position | Int |
| CRSDepTime | flight's estimated departure time using the format: [h]hmm being: h-> hour, m-> minute and [] optional int position | Int |
| ArrTime | flight's arrive time using the format: [h]hmm being: h-> hour, m-> minute and [] optional int position | Int |
| CRSArrTime | flight's estimated arrive time using the format: [h]hmm being: h-> hour, m-> minute and [] optional int position | Int |
| UniqueCarrier | unique id by carrier | String |
| FlightNum | Flight Number | Int |
| TailNum | N/A | N/A |
| ActualElapsedTime | flight's real duration in minutes | Int |
| CRSElapsedTime | flight's calculated duration in minutes | Int |
| AirTime | N/A | N/A |
| ArrDelay | flight's arrival delay | Int |
| DepDelay | flight's departure delay | Int |
| Origin | origin airport | String |
| Dest | destination airport | String |
| Distance | flight's disntance in meters | Int |
| TaxiIn | N/A | N/A |
| TaxiOut | N/A | N/A |
| Cancelled | Determinate if the flight has been cancelled or not. 0-> OnTime, 1 -> Cancel | Structuted Field |
| CancellationCode | Cancelation code | Int |
| Diverted | N/A | |
| CarrierDelay | Determinate if the flight has been cancelled beacause carrier issues or not. 0-> OnTime, 1 -> Cancel | Structuted Field |
| WeatherDelay | Determinate if the flight has been cancelled beacause weathers issues or not. 0-> OnTime, 1 -> Cancel | Structuted Field |
| NASDelay | Determinate if the flight has been cancelled beacause NAS issues or not. 0-> OnTime, 1 -> Cancel | Structuted Field |
| SecurityDelay | Determinate if the flight has been cancelled beacause Security issues or not. 0-> OnTime, 1 -> Cancel | Structuted Field |
| LateAircraftDelay | Determinate if the flight has been cancelled beacause Aircraft issues or not. 0-> OnTime, 1 -> Cancel | IStructuted Fieldnt |