Skip to content

ML framework for human avatar reconstruction and real-time manipulation from visual data

License

Notifications You must be signed in to change notification settings

VoidDominator/Cilindir

Repository files navigation

Cilindir

Partner Intro

Michael Luciuk, michael.luciuk@cilindir.ca

Cilindir is a startup with the goal of improving remote human-to-human communication for fostering collaboration and engagement in remote work. The product will allow users to fully immerse themselves in a virtual reality where they will be able to have more natural and human-like interactions remotely.

Description about the project

The tools that are currently being used for remote communication like Zoom are not nearly good enough to foster good communication. Cilindir aims to make remote human-to-human communication a lot more interactive by capturing user data using a network of cameras and then reconstructing the person as a 3D model. This will allow you to see the people you are speaking with as if you are actually in the room with them, drastically improving remote communication and making things like meetings and classes more interactive. ​

Key Features

The main feature that Cilindir will provide is a virtual character that uses a network of cameras to capture the user’s entire body, capturing lifelike detail and full body language. This part is the main part that our team is developing at the moment. After our team has a completed project, some use cases that we expect to have implemented will include:

  1. Allow for real-time pose estimation to move a model in the virtual space according to the user's movements
  2. Rendering reconstructed 3D models of users into Unreal Engine
  3. Ability to interact with realistic models of other users in a virtual space ​yolo11-pose-estimation

Instructions and Demos

Demo Video for Dliverable 5

The Cilindir product works using virtual reality pods which have an integrated network of cameras to scan your movements as well as a series of projectors that will allow you to see who you are speaking with. Since the product is in the early stages of development, we expect testing to be done with prior data that we collect and then using that data to test movements of the virtual character. Currently, we are able to test the movement of an Unreal Engine mannequin which has been rigged with controls using real time pose keypoint data. Note that this project is mainly for developer use as users will not be expected to have all the necessary tools like a dedicated network of cameras at their disposal. For now, developers are able to test the application and build on it. This test demo is explained below:

Live Pose Data UDP Demo

To use the demo, you will need a webcam source on your computer. The webcam represents any live sources and can simply be replaced with other ones, including one or more RTMP, RSTP and other types of sources.

Automated script (Windows only): scripts\run_pose_data_demo.bat

Manual demo instructions:

  • Launch src/pose/stream_pose.py, it uses your webcam as a live source and runs the YOLOv11 Pose model to obtain pose data. It will then stream the normalized x and y coordinates to the IP and port configured with UDP.
  • To see demo result, also run src/pose/udp_demo.py. Once it receives data, it will open a new window, mapping the data it receives as green dots on screen. The green keypoints will update as you move in front of the webcam.

This demo simulates how it would be possible to stream live pose data into an Unreal Engine socket using UDP, and provides a simple application using the data received.

Note that this demo shows the UDP communication part of the project only. For a more complete implementation using actual hardware in the virtual reality pod product, check src/pose/realtime_pose_capture_YOLO.py and src/pose/realtime_pose_capture_mediapipe.py to see a pose overlay using different models on the original video stream. And to get a more structured keypoints extraction, multiple cameras on one NVR could be made use of so that different angles would be taken into consideration. For reference, we used three IP cameras from REVODATA. This would need fixed angles of cameras and a lab to formalize these so to normalize keypoints from different angles.

3D Reconstruction Demo

In order to run the 3D reconstruction of a user, developers must first take a picture of the user from the front facing direction. In the future, this can be improved to require more cameras to get all angles of the user and make a much better reconstruction. The image is to be placed in the folder src/3d_reconstruction/PIfu/scripts. Our current workspace has a test image which can be used if the developer does not want to take an image themselves. If they do, they must change the file name of the image in the file src/3d_reconstruction/PIfu/scripts/run.py where currently, we use test_img. After doing this, developers need to download the needed dependencies and packages which can be found in src/3d_reconstruction/PIfu/pifuhd/requirements.txt. Then, simply run the file src/3d_reconstruction/PIfu/scripts/run.py and an .obj (or developers can optionally choose to convert to a .fbx) file will be generated. To view this file, use a 3D modelling software like Blender or import the mesh into your Unreal Engine workspace. An example Unreal Engine project can be found in src/unreal/ with the git submodule leading to the project "Model Mover" (may need to click Model Mover title on repository page to get latest updates to the submodule or optionally use link: https://github.com/lvince12/ModelMover). Currently, the reconstructed 3D Mesh is given as a static mesh and can be converted into a skeletal mesh through Unreal Engine. Future developers may want to look at Unreal Engine skeletons in order to build a skeleton for the skeletal mesh which can be used to build the control rig.

Unreal Smooth Walk Demo

To use the demo, you will need to have Unreal Engine 5.5 installed on your computer. The demo uses a python script to simulate perfect pose input, and sends it to our virtual character in Unreal Engine.

Automated script (Windows only): scripts\run_smooth_walk_demo.bat

Manual demo instructions:

  • Launch src/unreal/test/smooth_walk.py to start sending data to Unreal Engine. This script uses our pose API to send sets of keypoints with their coordinates to Unreal Engine through a TCP socket.
  • To see the result, run src/unreal/unreal-test/test.uproject in Unreal Engine. Note that the unreal project is in a submodule, so depending on your git config, you may need to clone it separately.
  • Click on the "Play" button in Unreal Engine to start. It should start a TCP listener and listen for incoming data from the Python script. Once a connection is established, received data will be parsed to move the character rig in Unreal Engine.
  • In this demo, the character will walk smoothly in place according to the data received from the Python script.

This demo shows a best case senario of how the cameras from the pose estimation step would be set up to gain perfect data. It uses a simulated perfect set of data to send to Unreal Engine and show how the character would move in a smooth way at 100 FPS.

Note: Despite using 15 FPS camera in the current physical prototype, we are showcasing that the Unreal Engine project can handle any FPS input (including 100 FPS in this demo).

Unreal-Pose Full Pipeline Demo

To use the demo, you will need to have Unreal Engine 5.5 installed on your computer. You will also need a webcam source on your computer. This demo puts everything from Unreal team and Pose team together using pose API to showcase a full pipeline using a webcam source as video capture.

Automated script (Windows only): scripts\run_full_pipeline_demo.bat

Manual demo instructions:

  • Launch src/pose/world_hollistic_landmark_detection.py, it uses your webcam as a live source and runs the MediaPipe Pose model to obtain pose landmark data. A preview of the camera and the recognized landmarks will be displayed in a new window. It will then use the pose API to set up a TCP server and stream the real world position of the user to Unreal Engine through a TCP socket.
  • To see the result, run src/unreal/unreal-test/test.uproject in Unreal Engine. Note that the unreal project is in a submodule, so depending on your git config, you may need to clone it separately.
  • Click on the "Play" button in Unreal Engine to start a TCP listener, which will listen for incoming data from pose estimation. The data will be parsed to move the character rig in Unreal Engine.
  • Make sure that your entire body is in frame of the camera preview, and move your body (spread your arms, move your legs, etc). The character in Unreal Engine should move accordingly to your movements.

This demo showcases the entire pipeline from capturing the pose data with 1 camera, processing it, sending it to Unreal Engine, parsing it and moving the character rig accordingly in Unreal Engine.

Note: Using only 1 camera (your webcam) may not produce the most accurate results in pose estimation, it is only for the ease of demo. The model output quality depends on the camera angle and webcam quality. The final Cilindir product will use a network of cameras to capture the user in 3D space, which will produce much more accurate pose estimation results. This network of cameras will ideally be already set-up by the Cilindir team who move forward with this project. For a perfect senario like this, check out the Unreal Smooth Walk Demo.

A Few Words For Future Developers

This is a relatively challenging project for us to work on due to its unconventional nature of having Unreal Engine as the "frontend" and Python as the "backend" (if we are trying to describe it using full-stack terms). For us it is mostly due to unfamiliarity with Unreal Engine and the AI models that we are using, so it was a bit of a learning curve. If you have experience with Unreal Engine and these AI models, it would probably be more straightforward. Also note that the project in its current stage is just an early prototype and proof of concept. There are still a lot of work to be done before it can be used in a fully functional product.

Things to Note

You should fully understand the project and its structure in terms of workflow and how the different parts of the project interact before you get started. Ask for clarifications if you are confused! Don't be afraid to revamp parts of the project if you think it can be done a better way. Currently, there are a lot of files in the repository that is not in use by our main workflow. They are used for testing and to demonstrate our learning process, and they are unlikely to appear in the final product. Here are the important files (apart from demos) that is functional and makes up the core of functionality:

  • src/pose/world_hollistic_landmark_detection.py: This is the main one-camera implementation for pose estimation using MediaPipe Pose model. It captures the webcam input and streams the pose data to Unreal Engine using the Pose API.
  • src/unreal/api/pose_api.py: This is the Pose API that we built to stream pose data to Unreal Engine. It sets up a TCP server and sends the pose data to Unreal Engine. See the Pose API documentation below for more details.
  • All of src/unreal/unreal-test/: This directory contains the Unreal Engine project that we built to receive the pose data and move the character rig accordingly. It contains the TCPLib that we built to receive the pose data over TCP sockets and parse it into Unreal Engine maps. It also contains the character rig that we built to move according to the pose data received from the Pose API programmatically. (It still works when the mesh changes because we set bone names from input data, instead of hardcoding a set of bones to transform.) The default scene is a demo scene with just the character being controled. To see our devlopment level, check Content/ThirdPerson/Maps/ThirdPersonMap in UE content drawer to see our dev enviornment. The rigged sample IK mesh is at /Game/Characters/Mannequins/Meshes/V2_SKM_Manny_Simple_IKRig. And the drag-and-drop rigged mannequin is at /Game/BP_Actor. The rigged mannequin already contains logic that uses TCPLib to receive pose data and move the character rig accordingly.
  • All of src/3d_reconstruction/PIfu/: This directory contains the final 3D reconstruction model chosen, which is PIFuHD. It contains the model and the scripts to run it. The main script to run is scripts/run.py, which takes an image of a user and reconstructs it into a 3D model. The reconstructed model can be viewed in Blender or imported into Unreal Engine. The reconstructed model is currently a static mesh, but it can be converted into a skeletal mesh in Unreal Engine. For more information on models and model choice, check src/3d_reconstruction/info.md.

Next Steps

Here are some suggestions for what to do next if you are a developer for this project, ranked from easiest to hardest (in our opinion):

  • Compile the UE project: In the current state, the Unreal Engine project compiles, but reports missing plugins (that works fine in development in environment) at runtime. So the packaged files doesn't work as expected. We don't expect this to be a difficult task, but we were unable to find a solution due to time constraints.
  • Further DevOps Inprovements: The current project only have a few simple tests and linting system. You can improve on this by implementing an auto build system using custom GitHub actions runners to build the Unreal Engine project. (The Windows image on GitHub does not contain Unreal Engine, and we cannot find a good way to install UE using command line.) I already included a stub for this in the submodule src/unreal/unreal-test/.github/workflows/unreal_build.yml. You can also add more tests to the Pose API and TCPLib to ensure that they are working as expected.
  • Set up multi-camera pose estimation: The current working implementation of pose estimation is done using a single camera, which is not ideal. You can try to improve the pose estimation by using multiple cameras to more accurately estimate the pose landmarks/keypoints. This may require some changes to the MediaPipe model, or even swap to a different model entirely (which is doable without breaking other parts of the project because of the usage of Pose API). The subteam in our group working on pose estimation reports that you may need a place to set up cameras in fixed positions for best results.
  • Set up rotation data input: The current implementation of the data pipeline between Python and Unreal Engine only contains keypoints and their coordinates. The rig in UE also accepts rotation data, which is currently hardcoded to be in a neutral(-ish) position. You can use Pose API and TCPLib to send rotation data to Unreal Engine in a similar way to how we send the landmark keypoints and their coordinates. This part in UE shouldn't be too difficult, but you will need to figure out how to derive rotation data from the keypoint coordinates and their relation to other keypoints in Python. (This is the current plan as of the end of the project, but if you have a better idea, feel free to do it your own way!)
  • Improve the 3D reconstruction: The current 3D reconstruction is done using a single image, which is not ideal. You can try to improve the reconstruction by using multiple images from different angles or using a video stream instead of a single image. This may require some changes to the PIFuHD model, or even swap to a different model entirely.
  • Implement 3D reconstruction - Unreal Pipeline: We currently have a working pipeline for pose estimation and sending the pose data to Unreal Engine. We also have a working 3D reconstruction model using PiFuHD. You can try to implement a pipeline that takes the reconstructed 3D model and sends it to Unreal Engine during runtime. We were unable to come up with a working plan within the timespan of the project, but it may be possible. You may need to create your own reconstruction API in src/unreal/api to do this because the current Pose API does not have the data structures to send 3D models.
  • Improve the Unreal Engine project: The current Unreal Engine project is a simple demo that shows how to receive pose data and move the character rig accordingly. You can try to improve the project by adding more features, such as an actual user interface, multi-user support (multiple users in the same virtual environment). This may require some changes to the TCPLib and UE, or an entire overhual of the UE project. You can also make the UE environment the front end of the entire project, and make it run the Python scripts when needed in the background.
  • Automated 3D reconstruction rigging: The current 3D reconstruction outputs a static mesh, but to make it move in Unreal Engine, we need to convert it into a skeletal mesh. This is traditionally a manual process, but we initially had the idea to automate this process using Python scripts. The thought process was to use the reconstructed mesh and capture pose data from the user at the same time during reconstruction, and use the pose data to rig the mesh. This is a very ambitious task and we were unable to come up with a working plan within the timespan of the project, and we are unsure if it is even possible. If you are able to complete this task (and most things above), you will have constructed a truely automated pipeline for the entire project, which will be a very impressive feat, and we would love to see it happen.

Of course, these are just suggestions from us, but feel free to work on whatever you see fit! Good luck, and we wish you can learn something new and have fun working on this project!

Development Requirements and Documentations

The main OS that developers will use is Windows. For pose estimation, they will need to utilize the MediaPipe library and for 3D reconstruction, they will use PIFuHD. Developers will also need to use libraries like PyTorch if they want to adjust the machine learning models for their use and use applications like Blender and Unreal Engine for improving 3D modelling of users. Note that use of tools like Unreal Engine may require large disks for storage and good GPUs to run the program. Developers may also need an NVIDIA GPU for the machine learning modelling but this is not necessary. Refer to Unreal Engine documentation for more information on the requirements for Unreal Engine.

Additionally, we encourage developers to use our API and libraries below when implementing or making changes to existing features. These interfaces makes it simple and more efficient to integrate modular parts of the project together. Developers are also welcome to design their own APIs and libraries and add to the selection here if they want to.

Pose API Documentation

The Pose API is our own Python library that provides a simple interface to start a TCP server and stream real-time pose data to Unreal Engine. It contains easy to use functions to integrate into any Python script, and is designed to be used with the pose estimation models from the Pose team.

Example Usage

# Import the server
from unreal.api.pose_api import UnrealEngineTCPServer

# Create and start the server
server = UnrealEngineTCPServer(host='localhost', port=15815)
server.start()

# Send custom data
my_data = {"bone": (0.5, 1.0, -0.3), "bone2": (0.0, 0.0, 0.0)}
server.send_data(my_data)

# When done
server.stop()

TCPLib Documentation

TCPLib is our own Unreal Engine blueprint function library that provides a simple interface for receiving data over TCP sockets. It allows you to set up a TCP client to connect to a server using the Socketer plugin that we rebuilt for this version of UE (plugin is included in the submodule repository) to listen for incoming data from pose API, and parse the data into Unreal Engine maps. It is designed to be easy to integrate into Unreal Engine Actors and update the character rig according to the input pose data in real time.

ConnectTCP

This is a helper function that sets up a TCP client to connect to a server. It takes the IP address and port number as input, outputs the socket object itself and a boolean value indicating whether the connection was successful or not. It will also print the connection result to the console and screen for debugging purposes. This function is designed to be used in the Unreal Engine event graph, and used in conjunction with the TCPTick function.

TCPTick

This is the main function of the blueprint function library. It is a blueprint function that runs every tick and checks for incoming data from the TCP socket. It will parse data and output it as a map of keypoints and their coordinates. It is designed to be used in the Unreal Engine event graph, and can be easily integrated into any actor or blueprint.

PrintMapContent

This is a simple function that prints the content of a map to the console. It is useful for debugging and checking the data received from the TCP socket. It takes a map as input and outputs the keypoints and their coordinates to the console. It is used for debugging in the TCPTick function.

CloseTCP

This is a helper function that closes the TCP socket. It takes the socket object as input and outputs a boolean value indicating whether the connection was closed successfully or not. It will also print the disconnection result to the console and screen for debugging purposes. This function is designed to be used in the Unreal Engine event graph, and can be easily integrated into an event of any actor or blueprint.

Deployment and Github Workflow

Our team will use Github projects to allow us to provide a scheduler for all members to see. Some portions of the project will have multiple team members focusing on them at once and they can work together remotely or split up the task even further. Each portion will be its own separate branch and once the feature is completed, we will create a pull request which other team members or our partner can check and approve. This will help with regular changes in smassssll chunks that can prevent large bugs and easier backtracking to earlier versions of the code that was bug-free.

We also have a CI/CD pipeline set up using GitHub Actions to automatically run tests and linting on our code. This will help us ensure that our code is always in a working state and follows the coding standards that we have set for the project. We are also planning a GitHub Action to automatically build the Unreal Engine project and run tests on it.

Coding Standards and Guidelines

We will follow coding standards enforced using our automated linter GitHub Actions as well as ensuring easy readibility with many comments, explaining each step. Additionally, we will make sure to implement our codeefficiently and mainly follow the partner's coding standards. ​

Licenses

​ The code is currently licensed through the MIT Licenses. More information can be found in the LICENSE.txt file.

About

ML framework for human avatar reconstruction and real-time manipulation from visual data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 8