Skip to content

An auto coding tool for python,off-brand github-copliot,trained with GPT2 transformer and github public repos codes

License

Notifications You must be signed in to change notification settings

LeonardoGCF/Auto_coding

 
 

Repository files navigation

Auto_coding

CIDeepSource

license

coverage Total alerts chat on Discord follow on Twitter

An auto coding tool for python,off-brand github-copliot,trained by GPT2 transformer,fed with github public repos codes

It contains a GPT2 model trained from scratch (not fine tuned) on Python code from Github. Overall, it was ~80GB of pure Python code, the current model is a mere 2 epochs through this data, so it may benefit greatly from continued training and/or fine-tuning.

Input to the model is code, up to the context length of 1024.

效果图

使用指南

按照代码文件名中的数字顺序依次运行即可,不要忘记先运行test.py(开始敲代码前)!

注意把数据文件和模型放到指定位置,路径不要有中文。

捷径

也可以下载现成的模型直接运行7.use_model.py和test.py,享受低代码coding的乐趣!

下载地址:https://huggingface.co/Sentdex/GPyT/blob/main/pytorch_model.bin

原理

test.py记录用户的键盘输入,并实时存入keyboard.txt,use_model.py异步读取txt中的内容,并通过训练好的模型进行预测。

Here's a quick example of using this model:

from transformers import AutoTokenizer, AutoModelWithLMHead

tokenizer = AutoTokenizer.from_pretrained("Sentdex/GPyT")
model = AutoModelWithLMHead.from_pretrained("Sentdex/GPyT")

'''copy and paste some code in here'''
inp = """import"""

newlinechar = "<N>"
converted = inp.replace("\n", newlinechar)
tokenized = tokenizer.encode(converted, return_tensors='pt')
resp = model.generate(tokenized)

decoded = tokenizer.decode(resp[0])
reformatted = decoded.replace("<N>","\n")

print(reformatted)

Should produce:

import numpy as np
import pytest

import pandas as pd

About

An auto coding tool for python,off-brand github-copliot,trained with GPT2 transformer and github public repos codes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%