KMS13 : C++(Hurffman code)을 이용한 txt 파일 압축기 (source)

Program_Light 2021. 9. 4. 13:00

이번에는 지난달에 올린 Huffman code를 이용한 영문 텍스트 파일을 모든 txt파일에서 사용할 수 있게 업그레이드 하는 동시에, 간단한 최적화를 진행해봤습니다. (물론 아직도 완벽한 최적화라 하기는 어렵지만요...)

technonogy

지난 소스에서 사용한 최고의 기술은 huffman code의 string length만을 가지로 huffman code를 만들어내는 Binary tree를 기반으로 구현한, Huffman code length 만으로 코드를 복구하는 프로세스인데, 이 방식에 약간의 메모리 최적화를 진행했습니다.

지난 소스에서도 설명했듯 허프만 코드는 기본적으로 이진 트리 위에서 구현됩니다. 허프만 코드를 저장하는 이진 트리의 특징을 살려 코드의 길이만으로 코드를 최적화해내는 프로세스를 만들었는데요, 이번 소스에서는 기존 소스에서 작성되있던 Btree node의 구성요소 top(부모의 주소를 저장하던 변수)를 제거하고, 내신 decompress함수에서 하나의 stack을 만들어냄으로써 top의 역할을 대신하게 만들었습니다.

뭐 추가적으로 몇가지 최적화를 진행했는데, 딱히 눈에 띄는 기술을 없으므로 넘어갈꼐요.

source

main.cpp (이번에 파일분할 할까 생각해보긴 했는데... 좀 아닌것 같애서 안했습니다.)

#include <iostream>
#include <fstream>
#include <sstream>

#include <algorithm>
#include <vector>
#include <stack>

using namespace std;

/*
* [file construction]
* contents file is constructed by [size of key file, [key file], [contents file]]
* size of key file means the number of element in key file, which can expressed in two pair of caracter
* it also means the number of Huffman code
*/

/*
* [key file construction]
*
* key file is constructed by loop of [character, code length]
*
* specially, key file include not code but code length
* since In this program, Huffman code will be formed, sorted regularly,
* so code can be rebuilt by code length
*
* character and code length is just one char valueable, so it can read and write easily
*
* (It's definitely not that code length is bigger than 127,
* since this program apply 128 character, so Huffman code cannot exceed 127)
*/

/*
* [data file construction]
*
* data file is constructed by one number and successive code list which is converted from original file
* file's first number is "length of successive code %(mod) 8", this number help program to read file correctly
* and the others are part for save code list.
* (code will stored in reverse state due to convenient of reading/writing)
*
* for instance, if code of 'a', 'b', 'c', is "0110", "1011" and "110", "accbac" will be converted like:
*           01101101 10101101 10110, and expressed like:
*           00000101 10110110 10110101 00001101
*/

namespace KmsStd {
    typedef unsigned char uchar;
    typedef long long intll;

    enum { FILE_CON_ERR = 0x11, FILE_EMPTY_ERR = 0x12, SUCCESS = 0x00 };

    template<typename T>
    std::ostream& write_typed_data(std::ostream& stream, const T& value) {
        return stream.write(reinterpret_cast<const char*>(&value), sizeof(T));
    }
    template<typename T>
    std::istream& read_typed_data(std::istream& stream, T& value) {
        return stream.read(reinterpret_cast<char*>(&value), sizeof(T));
    }

    /* ////////////////////////////// pressing ////////////////////////////// */
    namespace prs {
        typedef pair < pair < intll, uchar >, string > TYP1;

        bool CLC1(TYP1 a, TYP1 b) { return a.second < b.second; }
        void CLC2(string& BPF, ofstream& fout, int cycle, uchar ODF = 0) {
            for (int j = 0; j < cycle; j++) { ODF |= (unsigned char)(BPF[j] - '0') << j; }
            write_typed_data(fout, ODF);
        }

        //input file name, key file name, data file name
        int compression(string IFN = "input.txt", string DFN = "data") {
            ifstream fin;

            string IFC; //input file contents
            stringstream ss;
            int IFS; //input file size

            int IIV[256]; //index in vector (value range = -1 ~ 255)
            int TIC; //temporary index data for calculation (vlaue range = 0 ~ 255)
            uchar TDC; //temporary data for calculation

            int vector_size;
            pair<int, intll> CM; //calculation supporter about constructing minimun value
            intll CS; //calculation support valuable

            ofstream fout;

            intll PFS; //predicted pressed file size (just in range of data file)
            intll PWS; //predicted whole file size

            vector < TYP1 > code; //use_count, char, code
            string BPF; //buffer for writing file

            /* read file */
            fin.open(IFN);
            if (!fin) { return FILE_CON_ERR; }
            ss << fin.rdbuf();
            IFC = ss.str();
            IFS = IFC.size();
            if (!IFS) { return FILE_EMPTY_ERR; }
            fin.close();

            /* caclulate use count */
            fill_n(IIV, 256, -1);
            TIC = 0;
            for (int i = 0; i < IFS; i++) {
                TDC = (uchar)IFC[i];
                if (IIV[TDC] ^ -1) {
                    code[IIV[TDC]].first.first++;
                }
                else {
                    IIV[TDC] = TIC++;
                    code.push_back(make_pair(make_pair(1, TDC), ""));
                }
            }
            cout << "get file & organize file successfully ( " << IFC.size() * 8 << " bit )" << endl;

            sort(code.rbegin(), code.rend());

            /* construct huffman code */
            code[0].second = "0";
            if (code.size() > 1) { code[1].second = "1"; }

            vector_size = code.size();
            for (int i = 2; i < vector_size; i++) {
                CM = make_pair(-1, LLONG_MAX);
                for (int j = 0; j < i; j++) {
                    CS = code[j].first.first + (code[i].first.first * (code[j].second.size() + 1));
                    if (CS < CM.second) { CM.first = j; CM.second = CS; }
                }
                code[i].second = code[CM.first].second + "0";
                code[CM.first].second += "1";
            }
            sort(code.begin(), code.end(), CLC1);

            /* construct key file & calculate pressed file size*/
            fout.open(DFN, std::ios::binary);
            if (!fout) { return FILE_CON_ERR; }
            PFS = 0;
            write_typed_data(fout, (char)vector_size);//
            for (int i = 0; i < vector_size; i++) {
                write_typed_data(fout, code[i].first.second);
                write_typed_data(fout, (char)code[i].second.size());

                PFS += code[i].first.first * code[i].second.size();
                IIV[code[i].first.second] = i;
            }
            PWS = PFS + 8 + (16 * vector_size);
            cout << "construct huffman code & key file successfully ( " << PWS << " bit )" << endl;
            cout << "  { " << (double)PWS * 12.5 / IFS << " % }" << endl;

            /* construct data file */
            BPF = "";
            write_typed_data(fout, (unsigned char)((PFS - 1) % 8 + 1));
            for (int i = 0; i < IFS; i++) {
                BPF += code[IIV[(uchar)IFC[i]]].second;
                while (BPF.size() >= 8) {
                    CLC2(BPF, fout, 8);
                    BPF.erase(0, 8);
                }
            }
            if (BPF.size()) { CLC2(BPF, fout, BPF.size()); }
            fout.close();

            return SUCCESS;
        }
    }


    /* ////////////////////////////// extending ////////////////////////////// */
    namespace ext {
        enum { LEFT, RIGHT };
        //binary tree node | this class is unsafe to use freely. check the conditions to make functions work normally.
        class BTnode {
        public:
            uchar data;
            //lft node's code = this node's code + '0'
            BTnode* lft;
            //rht node's code = this node;s code + '1'
            BTnode* rht;

            BTnode(uchar data = 0x0, BTnode* left = nullptr, BTnode* right = nullptr) {
                this->data = data;
                this->lft = left;
                this->rht = right;
            }

            ~BTnode() {
                if (this->lft != nullptr) { delete(this->lft); }
                if (this->rht != nullptr) { delete(this->rht); }
            }

            //binary tree node connectioon unsafe
            void conU(const int dir, const uchar V = 0x0) {
                BTnode* lt = new BTnode(V);
                if (dir == LEFT) { this->lft = lt; }
                else { this->rht = lt; }
            }

            //is node filled?
            bool isF(void) {
                if ((this->lft != nullptr) && (this->rht != nullptr)) { return true; }
                else { return false; }
            }
        };

        //calculation 1 (cur and BTR must be initalized and located at one modified binary tree. fout must be initalized, opened)
        void CLC1(const uchar IPI, BTnode*& cur, BTnode*& BTR, ofstream& fout, int cycle = 8) {
            for (int i = 0; i < cycle; i++) {
                if (IPI & (1 << i)) { cur = cur->rht; }
                else { cur = cur->lft; }

                if (cur->data) {
                    fout.put(cur->data);
                    cur = BTR;
                }
            }
        }

        //result file name, key file name, data file name
        int decompression(string RFN = "output.txt", string DFN = "data") {
            ifstream fin;
            ofstream fout;

            uchar IHS; //input Huffman code size
            uchar IPC; //input character
            uchar SKF; //size of key file
            bool LVO; //logic valuable for optimization

            BTnode* BTR = new BTnode; //binary tree root node
            stack<BTnode*> CAL; //current address location of BTnode tree
            BTnode* CUR; //current location of BTnode tree
            int DPT; //depth of tree

            uchar LFS; // last file size
            uchar IPI; //input string data
            uchar BIPI; //before input string data

            /* read key file & construct BT tree */
            fin.open(DFN, std::ios::binary);
            if (!fin) { return FILE_CON_ERR; }
            if (fin.eof()) { return FILE_EMPTY_ERR; }

            DPT = 0;
            read_typed_data(fin, SKF);
            if (!SKF) { SKF = 256; }
            CAL.push(BTR);
            for (int k = 0; k < SKF; k++) {
                read_typed_data(fin, IPC);
                read_typed_data(fin, IHS);

                CUR = CAL.top();
                LVO = CUR->isF();
                while ((DPT < (IHS - 1)) || LVO) {
                    if (LVO) { CAL.pop(); DPT--; }
                    else {
                        if (CUR->lft == nullptr) {
                            CUR->conU(LEFT);
                            CAL.push(CUR->lft);
                        }
                        else {
                            CUR->conU(RIGHT);
                            CAL.push(CUR->rht);
                        }
                        DPT++;
                    }
                    CUR = CAL.top();
                    LVO = CUR->isF();
                }

                if (CUR->lft == nullptr) { CUR->conU(LEFT, IPC); }
                else { CUR->conU(RIGHT, IPC); }
            }
            cout << "read / construct Huffman code tree sucessfully" << endl;

            /* read data file & make extended file */
            fout.open(RFN);
            if (!fout) { return FILE_CON_ERR; }

            CUR = BTR;
            read_typed_data(fin, LFS);
            read_typed_data(fin, IPI);
            for (;;) {
                BIPI = IPI;
                read_typed_data(fin, IPI);
                if (fin.eof()) { break; }
                CLC1(BIPI, CUR, BTR, fout);
            }
            CLC1(BIPI, CUR, BTR, fout, LFS);
            delete BTR;

            fin.close();
            fout.close();
            cout << "write extended file successfully" << endl;

            return SUCCESS;
        }
    }
}

int FNC1(int RST) {
    switch (RST) {
    case KmsStd::FILE_CON_ERR:
        cout << "file connect error\n";
        return 1;
    case KmsStd::FILE_EMPTY_ERR:
        cout << "file empty error\n";
        return 1;
    case KmsStd::SUCCESS:
        cout << "success\n";
        return 0;
    default:
        cout << "unknown error\n";
        return 1;
    }
}

int main() {
    int INP, RST; //input, result status
    string FNM; //file location + name + extention

    cout << "copyright by 2021 KMS studio"  << endl;
    for (;;) {
        cout << "\n     [  compression : 1  |  decompression : 2  |  auto compression & decompression : 3  |  exit : 4  ]\n";
        do {
            cout << "->";
            cin >> INP;
        } while ((INP < 1) || (INP > 4));

        if (INP == 1) {
            cout << "     input file name\n->";
            cin >> FNM;
            if (FNM == "auto") { FNM = "input.txt"; }
            RST = KmsStd::prs::compression(FNM);
            FNC1(RST);
        }
        else if (INP == 2) {
            cout << "     result file name\n->";
            cin >> FNM;
            if (FNM == "auto") { FNM = "output.txt"; }
            RST = KmsStd::ext::decompression(FNM);
            FNC1(RST);
        }
        else if (INP == 3) {
            RST = KmsStd::prs::compression();
            if (!FNC1(RST)) {
                cout << '\n'; //cout << endl;
                RST = KmsStd::ext::decompression();
                FNC1(RST);
            }
            else { cout << "connot perform decompression : fail to press input file\n"; }
        }
        else { break; }
    }
    return 0;
}

실행방법

1) 압축

사용자는 프로그램 실행 후, 1을 입력합니다.
이후 "input file name"이라는 메세지가 나오면, 압축하고자 하는 영문 텍스트 파일의 이름을 위치와 확장자인 ".txt"를 포함하여 입력합니다. ex ) C:\Users\mskang\Desktop\input.txt
입력후, 프로그램은 압축을 진행합니다. 압축된 파일의 데이터는 "data"라는 이름으로 저장됩니다. 위치는 프로그램이 있는 위치입니다.

* "auto"를 입력하시면 "input.txt"로 인식합니다.

2) 압축 해제

사용자는 반드시 1번 과정을 통해 data파일을 소스파일이 있는 위치에 저장하고 있어야 합니다.
프로그램 실행 후, 2를 입력합니다.

이후 "result file name"이라는 메세지가 나오면, 압축 해제된 데이터가 저장될 텍스트 파일의 이름음 위치와 확장자인 ".txt"를 포함한 영문 텍스트 파일을 입력합니다. ex) C:\Users\mskang\Desktop\result.txt
입력후, 프로그램은 압축 해제를 진행합니다. 결과 텍스트 파일은 사용자가 입력한 경로에 저장될 것이며, 경로가 없다면 프로그램이 있는 위치에 파일이 저장될 것입니다.

* "auto"를 입력하시면 "output.txt"로 받아들입니다.

3) 기타 기능

프로그램 소스가 있는 파일에 "input.txt"라는 이름을 가진 텍스트 파일이 있어야 합니다.

프로그램 실행 후, 3을 입력합니다.

프로그램이 자동으로 프로그램이 있는 위치의 "input.txt"파일을 압축하고, 압축 결과물들을 압축 해제한 뒤, 결과물을 프로그램이 있는 위치의 "output.txt"에 저장합니다. 이 기능은 프로그램의 신뢰성을 테스트하기 위해 개발되었습니다.

* 4를 입력하면 종료됩니다.

* 모든 파일 이름들에는 띄어쓰기가 없어야 합니다.

* 함수 선언 형태를 보시면 알겠지만, main함수를 개량하면 key file, data file의 위치, 이름까지 직접 지정하실 수 있습니다. 개량은 상당히 간단하기 떄문에, 개량은 직접 하시길 바랍니다.
이미 함수부에 많은 개량을 적용한 상태라 상당히 지치는군요.

실행파일

사실 실행파일을 블로그에 올리는 것은 바람직하지 않다 생각합니다. 지난번 글에서 깨달았는데, .exe파일을 웹에서 다운 받은 후, 실행할려고 하면 온갖 보안 경고가 다 뜹니다.

그래서 저는 소스코드를 다운로드하신 후 실행파일은 직접 만드시기를 바라며, 소스코드와 테스트 데이터만 올려드리도록 하겠습니다.

* 참고로 UI는 없습니다. 사용방법만 보셔도 감이 오죠?

KMS 13소스 전문

https://github.com/KMSstudio/KMS13-compression-program-based-on-Huffman-code

GitHub - KMSstudio/KMS13-compression-program-based-on-Huffman-code: https://kms-program.tistory.com/30?category=456666

https://kms-program.tistory.com/30?category=456666 - GitHub - KMSstudio/KMS13-compression-program-based-on-Huffman-code: https://kms-program.tistory.com/30?category=456666

github.com

KMS 13 테스트 자료 (세계 인권 선언문 전문)

https://drive.google.com/file/d/1yp2zJXdGFbDXbokHd-vN3oSZ6EU1tsN3/view?usp=sharing

KMS13_input.txt

drive.google.com

참고 : 실행화면 영상(과거 버전의 실행영상입니다.)

https://www.youtube.com/watch?v=PpgypqE-cOU

이상으로 KMS였습니다. 감사합니다.

'Program_Light' 카테고리의 다른 글

KMS14 : Python을 이용한 simple rotatable 3D 구현 프로그램 (0)	2021.11.06
KMS14 : Python을 이용한 simple rotatable 3D 구현 프로그램 (source) (0)	2021.10.02
KMS13 : C++(Hurffman code)을 이용한영문 txt 파일 압축기 (source) (0)	2021.08.07
KMS12 : Python PyQt5를 이용한 줌 반자동 참가기 (source) (0)	2021.07.03
KMS12 : Python PyQt5를 이용한 줌 반자동 참가기 (source) (0)	2021.06.05

ABOUT ME

KMS_studio programming KMS_studio programming

technonogy

source

실행방법

실행파일

'Program_Light' 카테고리의 다른 글

티스토리툴바

ABOUT ME

technonogy

source

실행방법

실행파일

'Program_Light' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바