parsing - Reading a file in chunks and appending the incomplete line to the next read -


i trying read in following file:

abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz 12345abcdefghijklmnopqrstu abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz 

the code below:

#include <iostream> #include <fstream> #include <sstream> #include <thread> #include <mutex> #include <vector> #include <array> #include <algorithm> #include <iterator>  #define chunk_size 55  std::mutex queuedumpmutex;  void getlinesfromchunk(std::vector<char>& chunk, std::vector<std::string>& container) {     static std::string str;     unsigned int = 0;     while(i < chunk.size())     {            str.clear();         size_t chunk_sz = chunk.size();          while(chunk[i] != '\n' && < chunk_sz )         {             str.push_back(chunk[i++]);         }         std::cout<<"\nstr = "<<str;          if (i < chunk_sz)         {             std::lock_guard<std::mutex> lock(queuedumpmutex);             container.push_back(str);         }         ++i;     }     chunk.clear();     std::copy(str.begin(), str.end(), std::back_inserter(chunk));     std::cout << "\nprinting chunk out ....." << std::endl;     std::copy(chunk.begin(), chunk.end(), std::ostream_iterator<char>(std::cout, " ")); }  void readfileandpopulatedump(std::ifstream& in) {     std::vector<char> chunk;     chunk.reserve(chunk_size*2);     std::vector<std::string> queuedump;      in.unsetf(std::ios::skipws);     std::cout << "chunk capacity: " << chunk.capacity() << std::endl;      do{         in.read(&chunk[chunk.size()], chunk_size);         std::cout << "chunk size before getlines: " << chunk.size() << std::endl;         getlinesfromchunk(chunk, queuedump);         std::cout << "chunk size after getlines: " << chunk.size() << std::endl;     }while(!in.eof()); }  int main() {     std::ifstream in("/home/ankit/codes/more_practice/sample.txt", std::ifstream::binary);     readfileandpopulatedump(in);     return 0; } 

what wish achieve container line complete.

by mean suppose chunk_size reads only:

abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz 12 

the container should like:

abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz 

instead of:

abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz12 

now understand chunk.reserve(chunk_size) reserves given memory , not assign size. because if not able read in.read().

if use chunk.resize(chunk_size) , append end want remaining characters '12' appended complete line.

now issue code being repeated more should. according me conditions seem fine.

any appreciated.

sorry don't understand why you:

  • read file in binary mode , not in text mode
  • don't use getline()
  • use vector<char> instead string

for understand problem propose, way

#include <cstdlib> #include <fstream> #include <iostream>  int main()  {    std::ifstream  f("sample.txt");  // text mode!     std::size_t const  chunksizemax = 55u;     std::string  str;    std::string  chunk;     while ( std::getline(f, str) )     {       if ( chunksizemax <= (chunk.size() + str.size()) )        {          std::cout << "chunk: [" << chunk << "]\n";           chunk.clear();        }        chunk += str;     }     std::cout << "last chunk: [" << chunk << "]\n";     return exit_success;  } 

hoping helps.


Comments