You work for a cloud application provider DeveloperParadise that needs to convince its investors that their business has a bright future.
Your task is to show that the number of customers (users of DeveloperParadise, which happens to compete with Atlassian's BitBucket) is comparable to that of BitBucket. The presentation for investors is next week (completely incidentally, exactly on the day this project is due).
Feel free to use examples in Project1.ipynb or any other information you can gather to obtain as complete as possible list of users and repositories hosted on BitBucket.
While a template for the search strategy is provided, feel free to use any means (including borrowing with attribution from the other teams) to get as many as you can within this very short but, unfortunately, very realistic time frame.
Please keep in mind operational data pitfalls, such as
- lack of context
- missing data
- possibly incorrect data
- DeveloperParadise may focus on specific types of customers (e.g., size, programming language, or other characteristics), and that number is a subset of all customers.
- Pick the desired characteristics of target customers for DeveloperParadise and estimate their percentage is in your retrieved set
- Is there any way to estimate what fraction of all BitBucket customers/repositories are in the retrieved set?
- Are the distributions of customer types the same in the retrieved and undiscovered sets of BitBucket customers?
You can git clone https://github.com/fdac/TeamX repository where is X is the number of your team. You can also push to that repository as well (if you are on that team).