Dacon 도배 하자 질의 응답 처리 경진대회 (3) Ko-SOLAR 모델 테스트 및 Data Parallel

Dacon 도배 하자 질의 응답 처리 경진대회 (3) Ko-SOLAR 모델 테스트 및 Data Parallel

2024. 2. 20. 14:47ㆍDL Life

Ko-SOLAR 모델을 한번 시도해보려고 있는데 생각보다 GPU 가 많이 필요하다.

OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB (GPU 1; 79.35 GiB total capacity; 78.18 GiB already allocated; 165.19 MiB free; 78.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

일단 A100 서버가 있으니 올리는 것을 시도해보고 나서 Quantization 등을 시도해보거나 하려고 한다.
사실 모델 성능을 올리는데 이게 먼저가 아닌 것 같긴한데 공부한다는 차원에서 실험해보고 있다.

대회 조건이 A100 2대 이기 때문에 그정도 크기로 줄이는 것을 목표로 잡고
지금은 일단 여러개 GPU 로 올리는 것을 시도하고 있는 중이다.
Jupyter notebook 에서 Data Parallel 을 시도하니 계속 0번 GPU 만 잡아서 이걸 해결하는 포스팅을 올리고자 한다.

OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 79.35 GiB total capacity; 77.65 GiB already allocated; 47.19 MiB free; 77.82 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

! 일 단 문제점은 내가 Formatted Data 를 그냥 통으로 올려서 생긴 문제인것 같다
이걸 Data Loader 로 바꿔서 batch 를 넣어줘야할 듯

GPT 한테 물어봐서 바꿨더니 올라가긴 올라가는데 계산 오류가 생겼다 이거는 어떻게 해결하지?

RuntimeError: grad can be implicitly created only for scalar outputs

--- ing

Reference

#1 https://artiiicy.tistory.com/61

Link

“이 글은 Obsidian 에서 작성되어 업로드 되었습니다”

'DL Life' 카테고리의 다른 글

Dacon 도배 하자 질의 응답 처리 경진대회 (4)QLoRA + 4bit quantization + LDCC-SOLAR-10.7B(≈9GB vram used) 코드 테스트 (0)	2024.02.20
Dacon 도배 하자 질의 응답 처리 경진대회 (2) 임베딩 모델 및 베이스라인 코드 (0)	2024.02.19
Dacon 도배 하자 질의 응답 처리 경진대회 (1) 대회 탐색 (0)	2024.02.12
Long-tail Distribution Learning Survey Review (0)	2024.02.08
Contrastive Chain of Thought (1)	2023.12.01

Road To Engineer

Road To Engineer

태그

최근글

댓글

공지사항

아카이브

Reference

Link

'DL Life' 카테고리의 다른 글

관련글

티스토리툴바