In this article we’re going to talk about file reading in Java as well as its APIs and packages and the pros and cons of each approach. Additionally, for each section, we will implement a short program to read files using that API.
1. Overview
Without a doubt, reading and writing files is an integral part of any type of software development. Therefore, understanding which API is best for each type of file is essential for making the most of it. Thankfully, Java offers multiple APIs out of the box which covers most development needs.
2. Introduction to Java IO API
In essence, there are two types of streams in Java one can use to handle files in Java, namely:
- Byte Streams
- Character Streams
We use byte streams to read binary data, such as: image, video, audio or zip files among others. In addition, it can also read all sorts of text files. Just keep in mind they will still be bytes and if you want them to be readable texts you will have to convert them into characters. Note that byte stream classes end with -InputStream, the abstract class name.
Alternatively, we use character streams to read text files such as: TXT, PDF, CSV, XML, or any other kind of text files. Moreover, using character streams makes handling text files easier as it provides unique methods. For example, String readLine()
to read an entire line of text and Stream<String> lines()
, which converts the file into a stream of strings. Similarly, character stream classes end with -Reader, the abstract class name.
Let’s take a look at some of the classes Java offers for reading byte streams.
2.1 Resources and Files
In order for this tutorial to work, it will require extra files to interact with. Thus, we will provide some extra files that will be added to the project root folder. Bear in mind that any extra files added will be available for download.
This is a piece of text added to help you perform all the operations in this article. Feel free to use any other resources of your preference. But, do not forget to provide the right location for it. The content here has no specific order nor does it have any purpose besides serving as a resource. Again, feel free to replace the content of this file as you please, but keep in mind it'll alter the results. Now that you are all set up, let's start coding and have some fun with Java.
Firstly, we will create a folder named resources to hold all files. Then, we’ll add a File
type variable that represents the file and its path in the file system. Finally, we will add a static method getTextFileLocation()
to return its path.
Note that the path
in File
is immutable, so once you set a path in the constructor, it cannot be changed. Thus, for every new file we will create a new final instance of File
.
public class Helper { private static final File file = new File("resources/text_file.txt"); public static File getTextFileLocation() { return file; } }
Alternatively, for every constructor that takes an InputStream
, you can pass System.in
as it is a static final reference to an input stream which is already open and ready to supply input data, typically from a keyboard. However, for the purposes of the article, the focus will be on working with files rather than keyboard input.
2.2 Simplifying Code with Reusable Helper Methods
Throughout this article, we will encounter situations where we have to write boilerplate code repeatedly. So, to avoid such repetition and write cleaner and more concise code, we will use helper methods whenever appropriate. For instance, when printing text to the Terminal. However, please note that this article is NOT about refactoring, algorithm design, or design patterns.
public static void print(InputStream in) throws IOException { int datum; while ((datum = in.read()) != -1) System.out.print((char)datum); System.out.println(); }
2.3 File Class Essential Methods
When working with files in Java, there are instances where we encounter uncertainties regarding the existence of a file or our permissions to read from or write to it. This is where the File class and its essential methods come into play. That is, the File class, being an integral part of file handling in Java, provides us with powerful methods such as exists()
, length()
, createNewFile()
, canRead()
, and canWrite()
among others to address these uncertainties.
These methods allow us to programmatically determine if a file exists, check if we have permission to read its contents, or verify if we can modify the file. By utilizing these methods, we can gracefully handle different scenarios while reading from or writing to files. However, it is important to note that while this article touches upon the significance of the File class, its primary focus lies on exploring the main methods that assist developers in handling file-related challenges.
3. The Byte Stream Classes in Java I/O
In this section, we will cover the old Java Byte Stream classes, most of which have been available since Java 1. Therefore, the classes from java.nio (New Input Output) won’t be covered in this article.
Note that byte streams consist of 1 byte per character. Therefore, if you were to read a text which contains Unicode Encoding, and its characters are 2 byte long. Then, the InputStream
will read them as 2 separate bytes resulting in your text being misrepresented. Alternatively, you can write code to convert bytes into characters using the correct encoding,
3.1 The Piped Input Stream Class
The PipedInputStream
class is suitable for scenarios where you need to pass data between threads, a good example would be a Producer/Consumer application. Furthermore, you can also use it for testing purposes. If you want to learn more about Threads and Producer and Consumer, check out our article on Multithreading.
3.1.1 The Piped Input Stream Producer Method
private static Runnable getPipedOutputStreamRunnable(PipedOutputStream out) { return () -> { try (out; Scanner scan = new Scanner(Helper.getTextFileLocation())) { // Note that we used try-with-resources introduced in JDK7. while (scan.hasNextLine()) { out.write(scan.nextLine().getBytes()); if (scan.hasNextLine()) out.write("\n".getBytes()); try { Thread.sleep(5000); } catch (InterruptedException e) { e.printStackTrace(); } } out.flush(); } catch (IOException e) { e.printStackTrace(); } }; }
Note that the above method returns a Runnable
object and takes a PipedOutputStream
object which we will write data into.
To begin with, we loop through, line by line, our text file using the Scanner class. Then we write each line plus a new-line character into out
. Finally, for every loop iteration, our thread sleeps for about 5 seconds before it loops again.
Bear in mind that to use try-with-resources you must have JDK7 or above.
3.1.2 The Piped Input Stream Consumer Method
private static Runnable getPipedInputStreamRunnable(PipedOutputStream out) { return () -> { try (PipedInputStream in = new PipedInputStream(out)) { TerminalPrinter.print(in); } catch (IOException e) { e.printStackTrace(); } }; }
Similarly, the above method returns a Runnable
object, and takes a PipedOutputStream
object. But this time, we use this stream to connect to the PipedInputStream
, so that every time the output stream writes data, the input stream reads and prints it to the terminal.
public static void readUsingPipedInputStream() { PipedOutputStream out = new PipedOutputStream(); Runnable runnableOut = getPipedOutputStreamRunnable(out); new Thread(runnableOut).start(); Runnable runnableIn = getPipedInputStreamRunnable(out); new Thread(runnableIn).start(); }
Finally, we must create two Runnable
objects and start a Thread
for each. As a result, it will print a line of our text file to the terminal every five seconds (give or take).
This is a piece of text added to help you perform all the operations in this article. Feel free to use any other resources of your preference. But, do not forget to provide the right location for it. The content here has no specific order nor does it have any purpose besides serving as a resource. Again, feel free to replace the content of this file as you please, but keep in mind it'll alter the results. Now that you are all set up, let's start coding and have some fun with Java.
3.2 The File Input Stream Class
The FileInputStream
class is designed to read streams of raw bytes. For instance, image data from the file system. Therefore, every time read()
is invoked, it will perform a system call which can be expensive in terms of performance.
public static void readUsingFileInputStream() { try (InputStream in = new FileInputStream(Helper.getTextFileLocation())) { TerminalPrinter.print(in); } catch (IOException e) { e.printStackTrace(); } }
As you can see, just a few lines of code were needed in order to read an entire file and print it to the terminal.
This is a piece of text added to help you perform all the operations in this article. Feel free to use any other resources of your preference. But, do not forget to provide the right location for it. The content here has no specific order nor does it have any purpose besides serving as a resource. Again, feel free to replace the content of this file as you please, but keep in mind it'll alter the results. Now that you are all set up, let's start coding and have some fun with Java.
According to our testing, performing the previous operation resulted in 491 system calls. Also, it took about 12 milliseconds to complete.
3.3 The Buffered Input Stream Class
Similarly, the BufferedInputStream
also reads streams of raw bytes, but instead of reading one at a time, it reads them in chunks of 8192 by default. Note that this size is customizable via one of its constructors that takes two parameters.
Also, keep in mind that the BufferedInputStream
is a subclass of the FilterInputStream
which adds extra functionality to regular InputStream
.
Indeed, this approach requires one more step compared to the previous one. That is, you must provide BufferedInputStream
with an InputStream
. Nevertheless, the performance boost it offers makes it a worthwhile choice.
public static void readUsingBufferedInputStream() { // Note that it requires an InputStream object which requires a File object. try (InputStream in = new BufferedInputStream(new FileInputStream(Helper.getTextFileLocation()))) { TerminalPrinter.print(in); } catch (IOException e) { e.printStackTrace(); } }
This time our testing resulted in just 3 system calls and it took about 5 milliseconds to complete. In other words, about 60% performance boost. However, due to it buffering a whole chunk of bytes into memory, this approach will consume more memory.
3.4 The Sequence Input Stream Class
The SequenceInputStream
class is designed to read two or more InputStream
objects in sequence. Hence its name. Additionally, it provides a constructor that takes a Vector
of input streams if needed. Note that it does not read them in parallel. But rather one after the other.
For this example, we’ll add 3 smaller text files to be read in sequence. So, the first file contains the first two lines of our text_file.txt, the second file has the third and fourth lines, and the third file holds the last line. Note that we will work with 3 files. But the constructor takes 2 parameters.
public static void readUsingSequenceInputStream() { try (InputStream in1 = new FileInputStream("resources/text_part1.txt"); InputStream in2 = new FileInputStream("resources/text_part2.txt"); InputStream in3 = new FileInputStream("resources/text_part3.txt")) { SequenceInputStream seq = new SequenceInputStream(in1, in2); // Note that here we pass SequenceInputStream + FileInputStream SequenceInputStream sequence = new SequenceInputStream(seq, in3); TerminalPrinter.print(sequence); } catch (IOException e) { e.printStackTrace(); } }
Alternatively, you can use Vector
, Stack
, HashTable
or Properties
classes which provide an elements()
method that returns Enumeration<E>
. Then you can pass it as a parameter for the SequenceInputStream
constructor. Also, arrays and lists can also be converted into an Enumeration
.
Enumeration<InputStream> elements = Collections.enumeration(Arrays.asList(myArray)); // For arrays.
Similarly, you can:
Enumeration<InputStream> elements = Collections.enumeration(myList); // For lists
Optionally, we can rewrite the previous example using the Vector
class:
public static void readUsingSequenceInputStreamVector() { Vector<InputStream> vector = new Vector<>(3); // Also, it works with different InputStream implementations. try (InputStream in1 = new FileInputStream(Helper.getTextFileFrag_1()); InputStream in2 = new FileInputStream(Helper.getTextFileFrag_2()); InputStream in3 = new BufferedInputStream(new FileInputStream(Helper.getTextFileFrag_3()))) { vector.addAll(Arrays.asList(in1, in2, in3)); SequenceInputStream sequence = new SequenceInputStream(vector.elements()); TerminalPrinter.print(sequence); } catch (IOException e) { e.printStackTrace(); } }
3.5 The Byte Array Input Stream Class
So far, we have dealt with scenarios where the file is in some sort of storage medium. However, there may be situations where the file you need to handle is already in-memory. That is, it is already an array of bytes. So, its first constructor takes an array of bytes buf[]
, not an InputStream
.
Additionally, the ByteArrayInputStream
has a second constructors that takes an array of bytes buf[]
, an offset
(the first byte to read) and a length
(max number of bytes to read).
Firstly, we will create a method that loads a file into memory and returns an array of bytes.
public static byte[] getByteArray(File file) { try (InputStream in = new FileInputStream(file)) { return in.readAllBytes(); } catch (IOException e) { e.printStackTrace(); } throw new InputStreamException("Error reading bytes from Input Stream!"); }
Note that the Exception being thrown is a custom one.
public static void readUsingByteArrayInputStream() { // Note that we are passing our method getByteArray rather than an InputStream. try (InputStream in = new ByteArrayInputStream(getByteArray(Helper.getTextFileLocation()))) { TerminalPrinter.print(in); } catch (IOException e) { e.printStackTrace(); } }
3.6 The Pushback Input Stream Class
The PushbackInputStream
class is useful for when you need to pushback bytes you have already read. That is, you need to unread, send them back into the stream if you will. For instance, reading a file path that contains scape characters.
Also, keep in mind that the PushbackInputStream
is a subclass of the FilterInputStream
which adds extra functionality to regular InputStream
.
Similar to most InputStream
implementations, it also takes an InputStream
as a parameter. Additionally, you can specify the size of the pushback buffer by specifying an integer, size
, as a second parameter.
public static void readUsingPushbackInputStream() { byte[] bytes = "// Is product on sale?.\nif (isOnSale) System.out.println(price / 1.25);\n".getBytes(); try (PushbackInputStream in = new PushbackInputStream(new ByteArrayInputStream(bytes), 4)) { int datum, nextDatum; boolean isComment = false; while ((datum = in.read()) != -1) { if (isComment && datum == '\n') { in.unread(" */\n".getBytes()); isComment = false; continue; } if (datum == '/') if ((nextDatum = in.read()) == '/') { in.unread('*'); isComment = true; } else { in.unread(nextDatum); } System.out.print((char) datum); } } catch (IOException e) { e.printStackTrace(); } }
In this example, we created a short program to rewrite comments in Java from in-line comment pattern //… to the multi-line comment pattern /* … */. Note that for time’s sake, we won’t cover scenarios where multiple in-line comments occur in a row. That is, for every line we will have /* … */ instead of having just one regardless of the number of lines.
3.6.1 Code Functionality, Behavior and Limitations
For starters, we must specify the number of bytes that we are going to un-read in our program. That is, the number of times you can invoke unread()
. Keep in mind that after calling read()
this counter is reset. Note that the default value for the buffer size is 1 should none be provided. Also note that trying to unread more bytes than the size buffer provided will cause it to throw an IOException
.
As for the code above, it is quite simple. Firstly, we check whether the first characters are ‘/’, if it is not, print it. Then, we check whether the next character is ‘/’, if so, unread ‘*’, set the isComment
variable to true
and print the first character. Otherwise, just unread it and print the first character. Finally, we check whether isComment is true and datum is equal to ‘\n’, if it is, unread ‘ */\n’. Note that here we unread 4 bytes at once.
Note that this time we used PushbackInputStream
as the variable type because we have to use the unread()
method which is only available here. As a result, we replaced the in-line comment with the multi-line comment pattern.
/* Is product on sale?. */ if (isOnSale) System.out.println(price / 1.25); else System.out.println(price);
Indeed, we could’ve solved it without using unread()
. But, remember, this is for learning purposes.
CAUTION
Java The Complete Reference Twelfth Edition (Herbert Schildt)PushbackInputStream
has the side effect of invalidating themark( )
orreset( )
methods of theInputStream
used to create it. UsemarkSupported( )
to check any stream on which you are going to usemark( )
/reset( )
.
3.7 The Object Input Stream Class
Consistently, the ObjectInputStream
class is chiefly used to deserialize primitive data and objects that were written using ObjectOutputStream
. Additionally, classes must implement Serializable or Externalizable.
First things first, we must create a class, instantiate it and save it into a file. Let’s call it: person.dat.
private static final File objectFile = new File("resources/person.dat"); public static File getObjectFileLocation() { return objectFile; }
Next, we must create our class.
public class Person implements Serializable { private String name; private int age; public Person(String name, int age) { this.name = name; this. Age = age; } public String getName() { return name; } public int getAge() { return age; } @Override public String toString() { return "Person{" + "name='" + name + "', age=" + age + "}"; } }
As you can see, it is very simple, just for demonstration purposes.
Firstly, we must add a method to save the object.
public static Boolean saveObjectToFile(Person person) { try (ObjectOutputStream out = new ObjectOutputStream(new FileOutputStream(getObjectFileLocation()))) { out.writeObject(person); out.flush(); return true; } catch (IOException e) { e.printStackTrace(); } return false; }
Then, we will add a method to read data from this file and recreate our object. Note that the method above takes a Person object as a parameter.
private static void readObjectFromFile() { Person person; try (ObjectInputStream in = new ObjectInputStream(new FileInputStream(Helper.getObjectFileLocation()))) { person = (Person) in.readObject(); System.out.println(person); } catch (IOException | ClassNotFoundException e) { e.printStackTrace(); } }
Lastly, we must add the method that will save the object to the file. Then, call the method that will read it back into memory and print it to the terminal.
public static void readUsingObjectInputStream() { // Note that if save fails, it will throw an Exception. if (!Helper.saveObjectToFile(new Person("John", 25))) throw new RuntimeException(Person.class.getName() + " not saved!"); readObjectFromFile(); }
Finally, here is output:
Person{name='John', age=25}
3.8 The Filter Input Stream Class
Interestingly, you cannot instantiate the FilterInputStream
class due to its protected constructor. However, you can and should extend it to add extra functionality to the already existing input streams. For instance, the aforementioned BufferedInputStream
and PushbackInputStream
classes among others.
Clearly, we can see Java making great use of the Decorator Pattern to extend functionality as needed.
In order to demonstrate how it works, we will create our customized InputStream
named EncryptedInputStream
and DecryptedInputStream
.
3.8.1 The Custom Encrypt Input Stream Class
Before moving on, let’s check some important aspects of our class. Firstly, we have immutable variables that will be initialized in the constructor which takes two parameters, an InputStream in
and a byte key
. Then, we read all bytes from the InputStream
and pass it toencryptBuf
method that XORS each byte with the provided key
and saves it to a new array of encrypted bytes then returns it. Note that the key is also a byte ranging from -128 to 127. Last, for time’s sake, we won’t implement the overloaded read(byte[] b, int off, int len)
method, we will just throw an Exception if invoked.
package com.youlearncode.inputstreams; import com.youlearncode.exceptions.NotImplementedException; import java.io.*; public class EncryptInputStream extends FilterInputStream { private final byte[] buf; // Note that the variable count just holds the buf[] array length. private final int count; private int pos = 0, mark = 0; public EncryptInputStream(InputStream in, byte key) throws IOException { super(in); this.buf = encryptBuf(in.readAllBytes(), key); this.count = buf.length; } @Override public int read() { // Note that the & 0xff guarantees that only the last 8 bits will be read and any other will be ignored. return (pos >= count) ? -1 : (buf[pos++] & 0xff); } @Override public int read(byte[] b, int off, int len) { // Note that here we just throw a Custom Exception. throw new NotImplementedException("Implementation missing!"); } @Override public byte[] readAllBytes() { return buf; } private byte[] encryptBuf(byte[] buffer, byte key) { byte[] buf_ = new byte[buffer.length]; for (int i = 0; i < buffer.length; i++) buf_[i] = (byte)(buffer[i] ^ key); return buf_; } @Override public synchronized void mark(int readLimit) { mark = pos; } @Override public synchronized void reset() { pos = mark; } @Override public boolean markSupported() { return true; } }
3.8.2 The Custom Decrypted Input Stream Class
Similarly, the DecryptInputStream
class behaves just like its parent class EncryptInputStream
. Note that here we just need to call super()
passing the input stream and key as its parameters.
public class DecryptedInputStream extends EncryptedInputStream { public DecryptedInputStream(InputStream in, byte key) throws IOException { super(in, key); } }
Then, we need a method to save the encrypted stream into a file to be decrypted later.
public static boolean saveEncryptedInputStream(byte[] buf, File path) { try (OutputStream out = new FileOutputStream(path)) { out.write(buf); out.flush(); return true; } catch (IOException e) { e.printStackTrace(); } return false; }
Finally, we can first check whether the encrypted file exists. If it does, we just load it and print it to the terminal. If it doesn’t, we will read text_file.txt into memory, encrypt it using our brand new EncryptInputStream
and save it to encrypted_file.txt. Then we will load it and print it to the terminal.
3.8.2.1 Saving and Loading Encrypted Files
public static void readUsingEncryptedInputStream() { if (!Helper.getEncryptedFilePath().exists()) // Note that here we are negating the check with '!'. try (InputStream in = new EncryptInputStream(new FileInputStream(Helper.getTextFilePath()), (byte)123)) { if (Helper.saveEncryptedInputStream(in.readAllBytes(), Helper.getEncryptedFilePath())) System.out.println("Encrypted File Saved Successfully!"); } catch (IOException e) { e.printStackTrace(); } try (InputStream in = new DecryptInputStream(new FileInputStream(Helper.getEncryptedFilePath()), (byte)123)) { TerminalPrinter.print(in); } catch (IOException e) { e.printStackTrace(); } }
Due to the encryption process, the contents of the encrypted file will be rendered unreadable to humans.
/[[[[[[[[[[ [[[ [[[ Uvq=[ [[[[ [ [[ [ U[9W[[[ [[ [[ [[ [Uvq/[[ [[[[ [ [[[ [[ [[ [[[ Uvq:W[[ [[ [[[[[[[[W[[[[[\[ [[ Uvq5[[[ [[[W[\[ [[[ [[[[1 U
However, after decrypting it, the text file will be rendered readable again.
This is a piece of text added to help you perform all the operations in this article. Feel free to use any other resources of your preference. But, do not forget to provide the right location for it. The content here has no specific order nor does it have any purpose besides serving as a resource. Again, feel free to replace the content of this file as you please, but keep in mind it'll alter the results. Now that you are all set up, let's start coding and have some fun with Java.
3.9 The Data Input Stream Class
Similar to the ObjectInputStream
class, the DataInputStream
class also reads primitives and String. But it does NOT read objects. Additionally, DataInputStream
is more efficient than ObjectInputStream
. On the other hand, it requires more work and it is more error-prone.
Lastly, the order you write and read data really matters here. So, be careful when handling data because you may end up introducing a bug into your application.
For example: say you write two doubles into a file (height & width). Then, when you read them back, you do so in a different order (swapping values). That is, now height and width have values from one another, and that’s a problem.
For this example, we will name the file: person.bin.
private static final File datafile = new File("resources/person.bin"); public static File getDataFileLocation() { return datafile; }
Conveniently, we will add a method to save data from the Person object into a file.
public static boolean saveDataToFile(Person obj) { try (DataOutputStream out = new DataOutputStream(new FileOutputStream(Helper.getDataFileLocation()))) { out.writeUTF(obj.getName()); // Note that we saved name first. out.writeInt(obj.getAge()); // Also note that we must use a different method each primitive or String. return true; } catch (IOException e) { e.printStackTrace(); } return false; }
Finally, it’s time to add the method for reading the data from the file back into a Person object.
public static void readUsingDataInputStream() { if (!Helper.saveDataToFile(new Person("Mike", 44))) throw new RuntimeException(Person.class.getName() + " not saved!"); try (DataInputStream in = new DataInputStream(new FileInputStream(Helper.getDataFileLocation()))) { Person person = new Person(in.readUTF(), in.readInt()); System.out.println(person); } catch (IOException e) { e.printStackTrace(); } }
Note that if were to have a more complex class and its constructor parameters were to be also complex objects. Then reading data using DataInputStream
would be a laborious task. That is, every single primitive and String would have to be set one by one.
3.10 Other Input Stream Classes
As explained earlier in this article, it will not cover the NIO (which will be covered in a separate article). Additionally, some other classes were left out. For instance:
StreamTokenizer
— The constructor that accepts an Input Stream has been deprecated.LineNumberInputStream
— Class deprecated.StringBufferInputStream
— Class Deprecated.
Also, some other subclasses of the FilterInputStream
are outside of our scope since they are not in the java.io package. For example: DigestInputStream
, CheckedInputStream
, DeflaterInputStream
, InflaterInputStream
and many others you may find throughout Java Library.
4. The Characters Streams Classes in Java IO
In this section we will cover the old Java Character Stream classes, most of which have been available since Java 1. Therefore, classes from java.nio (New Input Output) won’t be covered in this article.
Whenever using byte streams to read files containing characters that have byte values outside of the 0-255 range, such as characters in Unicode, data loss or incorrect data may occur. This is because byte streams treat each byte as a separate unit and do not recognize multibyte character encodings. Therefore, it is important to use character streams when working with text data to ensure that the characters are properly encoded and decoded.
4.1 The Problem of Using Byte Streams for Texts
Although we have been using byte streams to handle our simple text files, for educational purposes, it is not recommended. Note that our text does NOT contain any special characters. Additionally, using byte streams to handle text files can lead to issues, such as character encoding problems, inconsistent line endings, and difficulties with non-ASCII characters.
4.1.1 Adding More Resources and Files
Firstly, we need to add another file to demonstrate the differences between byte streams and character streams. Then, we need a File variable to access the file. Finally, the new file will contain the same sentence in English, Portuguese, which is my native language and Greek, my friend’s native Language.
private static final File fileGlobal = new File("resources/text_file_global.txt"); public static File getGlobalTextFileLocation() { return fileGlobal; }
Here is the content of our new file: text_file_global.txt
Note that Java is a robust OO programming language. Observe que Java é uma linguagem de programação OO robusta. Σημείωσε οτι η Java είναι μια ισχύρη αντικειμενοστρεφής γλώσσα προγραμματισμού.
4.2 The Challenges of Reading a Multilingual File Using Byte Streams
So, if we were to read the above text using byte streams, we would end up having data loss or a garbled text. Note that for this example, our text will be in UTF-8 encoding, whereas Java InputStream
uses ISO 8859-1, and this will cause the sentence in Portuguese to become garbled. Additionally, ISO 8859-1 does NOT encompass every Greek character, which is represented by ISO 8859-7.
For instance, consider the next method:
public static void readNonACSIIUsingFileInputStream() { try (InputStream in = new FileInputStream(Helper.getGlobalTextFileLocation())) { TerminalPrinter.print(in); } catch (IOException e) { e.printStackTrace(); } }
This time, due to using byte streams, we ended up with a garbled text and loss of data.
4.2.1 Reading a UTF-8 Encoded Text Using Input Stream
Note that Java is a robust OO programming language. Observe que Java é uma linguagem de programação OO robusta. ΣημείÏÏε οÏι η Java είναι μια ιÏÏÏÏη ανÏικειμενοÏÏÏεÏÎ®Ï Î³Î»ÏÏÏα ÏÏογÏαμμαÏιÏμοÏ.
As you can see, only the English sentence was output perfectly. Now, if I change my text encoding to ISO 8859-1, The output will look like this:
4.2.2 Reading an ISO 8859-1 Encoded Text Using Input Stream
Note that Java is a robust OO programming language. Observe que Java é uma linguagem de programação OO robusta. ???????? ??? ? Java ????? ??? ?????? ?????????????????? ?????? ???????????????.
Evidently, the Portuguese sentence was fixed. But the Greek one is still completely unreadable. Moreover, even if we change the text encoding to ISO 8859-7 to represent all Greek characters, we will still end up with the following output:
4.2.3 Reading an ISO 8859-7 Encoded Text Using Input Stream
Note that Java is a robust OO programming language. Observe que Java ? uma linguagem de programa??o OO robusta. Óçìåßùóå ïôé ç Java åßíáé ìéá éó÷ýñç áíôéêåéìåíïóôñåöÞò ãëþóóá ðñïãñáììáôéóìïý.
As you can see, byte streams are not well-suited for handling text, particularly in today’s world where many text messages, comments, and posts contain not only special characters from various languages but also emojis. Additionally, several languages, including German, French, Greek, Finnish, and Dutch, use characters that cannot be represented by byte streams.
4.2.4 Using Charset Parameter in Character Stream Constructors for Decoding
Optionally, you can pass a specific Charset
to certain character stream classes to specify the character encoding required to decode their contents. Such as: InputStreamReader
and FileReader
. However, using a Charset
other than ISO 8859-1, US-ASCII, or UTF-8 may result in longer execution times due to the need to search through all available charsets.
4.3 The Piped Reader Class
So, the main differences between PipedInputStream
and PipedReader
are that the former reads bytes (8-bit units) and its superclass is InputStream
, whereas the latter reads chars (16-bit units) and its superclass is Reader
.
As the code for this example is virtually the same, we won’t comment on them. That is, we will post the code samples and their results to avoid repetition.
Note that for this first example, we just need to replace PipedOutputStream
with PipedWriter
and PipedInputStream
with PipedReader
in the code.
public static void readUsingPipedReader() { PipedWriter out = new PipedWriter(); Runnable runnableOut = getPipedWriterRunnable(out); new Thread(runnableOut).start(); Runnable runnableIn = getPipedReaderRunnable(out); new Thread(runnableIn).start(); } private static Runnable getPipedWriterRunnable(PipedWriter writer) { return () -> { try (writer; Scanner scan = new Scanner(Helper.getTextPtBrFileLocation())) { for (String line = scan.nextLine(); ; line = scan.nextLine()) { writer.write(line); if (scan.hasNextLine()) writer. Write("\n"); else break; try { Thread.sleep(5000); } catch (InterruptedException e) { e.printStackTrace(); } } } catch (IOException e) { e.printStackTrace(); } }; } private static Runnable getPipedReaderRunnable(PipedWriter out) { return () -> { try (PipedReader reader = new PipedReader(out)) { int datum; while ((datum = reader.read()) != -1) System.out.print((char)datum); } catch (IOException e) { e.printStackTrace(); } }; }
As a result, you get a perfect output. That is, an exact copy of the file content with no loss of data.
Note that Java is a robust OO programming language. Observe que Java é uma linguagem de programação OO robusta. Σημείωσε οτι η Java είναι μια ισχύρη αντικειμενοστρεφής γλώσσα προγραμματισμού.
4.4 The Input Stream Reader Class
The InputStreamReader
class takes an InputStream
(byte stream) or any of its subclasses and wraps it with a character stream that decodes bytes into characters using the specified Charset
. If no Charset
is provided, the stream uses the default character encoding of the platform. Note that the default Charset
typically depends upon the locale and charset of the underlying operating system.
Note that there is overhead involved in wrapping an InputStream
. Therefore, whenever possible, it is preferable to use character streams instead of byte streams when handling text data.
public static void readUsingInputStreamReader() { try (InputStream in = new FileInputStream(Helper.getGlobalTextFileLocation()); Reader reader = new InputStreamReader(in, StandardCharsets.UTF_8)) { int datum; while ((datum = in.read()) != -1) System.out.print((char)datum); } catch (IOException e) { e.printStackTrace(); } }
Also note that this time we passed a Charset object in the constructor. Here is the output:
Note that Java is a robust OO programming language. Observe que Java é uma linguagem de programação OO robusta. Σημείωσε οτι η Java είναι μια ισχύρη αντικειμενοστρεφής γλώσσα προγραμματισμού.
4.5 The File Reader Class
The FileReader
class is a subclass of InputStreamReader
and it is used to open a direct connection to a file by taking a File object as an argument. Even if a String name
is passed, the FileReader
class will internally instantiate a File object. Furthermore, a Charset can be passed to specify the appropriate encoding.
Note that for our computer the default character Charset
is UFT-8. However, as there is no way to know the default character Charset for your computer, we passed it as parameter to guarantee it will be used for decoding.
public static void readUsingFileReader() { try (Reader reader = new FileReader(Helper.getGlobalTextFileLocation(), StandardCharsets.UTF_8)) { int datum; while ((datum = in.read()) != -1) System.out.print((char)datum); } catch (IOException e) { e.printStackTrace(); } }
As you can see, we got the same result as the previous approach.
Note that Java is a robust OO programming language. Observe que Java é uma linguagem de programação OO robusta. Σημείωσε οτι η Java είναι μια ισχύρη αντικειμενοστρεφής γλώσσα προγραμματισμού.
Keep in mind that though the results we identical, the latter approach has been faster in our tests.
4.6 The String Reader Class
The StringReader
class is useful for when you need to read characters from a String
object in memory instead of a file or other data sources. Bear in mind you won’t be able to provide any character encoding and the default one will be used. Reading huge chunks of texts as String objects may lead to performance issues.
First things first, we need a String
object to read from.
private static String text = """ Note that Java is a robust OO programming language. Observe que Java é uma linguagem de programação OO robusta. Σημείωσε οτι η Java είναι μια ισχύρη αντικειμενοστρεφής γλώσσα προγραμματισμού. """;
Then, it is all the same boilerplate code other than passing the String
object to the constructor.
public static void readUsingStringReader() { try (Reader reader = new StringReader(text)) { int datum; while ((datum = reader.read()) != -1) System.out.print((char)datum); } catch (IOException e) { e.printStackTrace(); } }
4.7 The Char Array Reader Class
Similarly, you can use the CharArrayReader
class which is virtually the same as the StringReader
class except it accepts an array of character char and offers one extra constructor should you need to work with a subset of the array.
private static char[] chars = """ Note that Java is a robust OO programming language. Observe que Java é uma linguagem de programação OO robusta. Σημείωσε οτι η Java είναι μια ισχύρη αντικειμενοστρεφής γλώσσα προγραμματισμού. """.toCharArray();
As you can see, the code is very similar to the previous one.
public static void readUsingCharArrayReader() { try (Reader reader = new CharArrayReader(chars)) { int datum; while ((datum = reader.read()) != -1) System.out.print((char)datum); } catch (IOException e) { e.printStackTrace(); } }
4.8 The Filter Reader Class
In a similar fashion, the FilterReader
abstract class is meant to be extended to add functionality to the stream. As you will see, the three main subclasses of FilterReader
are: BufferedReader
, LineNumberReader
and PushbackReader
. That is, the main difference is the underlying stream.
4.9 The Buffered Reader Class
Besides the already known advantages of using buffered streams regardless of whether it is a byte stream or character stream, the BufferedReader
class offers a method to read an entire line at once.
public static void readUsingBufferedReader() { try (BufferedReader reader = new BufferedReader(new FileReader(Helper.getGlobalTextFileLocation(), StandardCharsets.UTF_8))) { for (String line = reader.readLine(); line != null; line = reader.readLine()) System.out.println(line); } catch (IOException e) { e.printStackTrace(); } }
4.10 The Line Number Reader Class
The LineNumberReader
class is a subclass of the BufferedReader
and it adds functionality to it. That is, it adds an easy way to add line numbers to a text file. Note however, that setLineNumber(int lineNumber)
does not actually change the current position in the stream; it only changes the value that will be returned by getLineNumber()
.
public static void readUsingLineNumberReader() { try (LineNumberReader reader = new LineNumberReader(new FileReader(Helper.getGlobalTextFileLocation(), StandardCharsets.UTF_8))) { for (String line = reader.readLine(); line != null; line = reader.readLine()) System.out.println(reader.getLineNumber() + " - " + line); } catch (IOException e) { e.printStackTrace(); } }
Also note that this approach is considerably slower according to our tests.
4.11 The Pushback Reader
Likewise, PushbackReader
offers all the same functionalities as PushbackInputStream
, including the read()
and unread()
methods, as well as their overloaded versions. However, instead of taking a byte[]
array, PushbackReader
takes a char[]
array. Additionally, PushbackReader
has a ready()
method that checks whether the stream is ready to be read without blocking. One major difference between PushbackReader
and PushbackInputStream
is that PushbackReader
methods are synchronized
, while the methods in PushbackInputStream
are NOT.
4.11.1 Extracting Data from a Character Stream with Pushback Reader
For example, consider the following scenario: we have a stream of characters from which we need to determine whether the characters being read represent a code (sequence of digits), a description or whitespaces that may appear erratically between them. Keep in mind the description may be a compound one (formed by two or more nouns), so we need to check for capital letters as well.
public static void readUsingPushbackStreamReader() { String stream = "101 Cars111 Motorbikes 122 PickupTrucks131 Bus142 Scooters 190 Helicopter199Airplanes"; Map<Integer, String> items = new HashMap<>(); try (PushbackReader reader = new PushbackReader(new StringReader(stream))) { int datum, value = 0; while (reader.ready() && (datum = reader. Read()) != -1) { if (Character.isWhitespace(datum)) continue; if (Character.isDigit(datum)) { // Note that the byte values for the digits '0' to '9' range from 48 to 57. value = datum - '0'; // Note that for every digit we multiply the value by 10. Then we add the current digit. while (Character.isDigit(datum = reader.read())) value = (value * 10) + (datum - '0'); } if (Character.isUpperCase(datum)) { StringBuilder sb = new StringBuilder(); sb.append((char) datum); while (Character.isLetter(datum = reader.read())) { // Note that this checks for compound nouns and separates them with a whitespace. See PickupTrucks -> Pickup Trucks! if (Character.isUpperCase(datum)) sb.append(' '); sb.append((char) datum); } // Note that at this point #datum is not a letter. Perhaps a digit, a whitespace or an EOF character. items.put(value, sb.toString()); reader.unread(datum); // Therefore, we must un-read the character to be properly reevaluated. } } System.out.println(items); } catch (IOException e) { e.printStackTrace(); } }
As a result, we broke down the stream into meaningful data.
{131=Bus, 101=Cars, 199=Airplanes, 122=Pickup Trucks, 142=Scooters, 190=Helicopter, 111=Motorbikes}
5. Different File Reading Approaches: A Comparative Analysis
With the variety of different streams available, it may seem overwhelming to choose the right one for your application. Don’t worry! We’ll break it down and help you easily select the most appropriate stream for your needs. For starters, let’s check the byte streams for all types of files except text-based ones.
5.1 Picking the Right Byte Stream Class
BYTE STREAM CLASS | WHAT IS IT GOOD FOR? |
---|---|
PipedInputStream | Facilitates the safe passing of data, at the byte level, between Threads, by creating a communication channel. Moreover, it effectively solves the Producer-Consumer problem. |
FileInputStream | The simplest way to read a file in Java as it provides no additional methods for reading bytes. Note that it is usually slow compared to BufferedInputStream and only requires a File object to work. |
BufferedInputStream | An enhanced version of FileInputStream that provides improved file reading performance by wrapping around FileInputStream. That is, it offers a configurable buffer size for efficient caching, reduced system calls and faster reading. In addition to the inherited methods, it offers additional methods such as mark(), markSupported(), and reset(). |
SequenceInputStream | Simplifies reading multiple files in sequence. However, not in parallel. Additionally, when dealing with a larger number of files, you can pass a Vector or any other collection via elements() method. |
ByteArrayInputStream | Makes it possible to work with data that is already in-memory, such as a file that has been loaded into the computer’s memory. That is, it allows you to create an input stream directly from a byte array, providing a convenient way to read the contents of the array as if it were a stream of bytes. |
PushbackInputStream | Allows you to ‘push back’ bytes that have been previously read or even insert bytes that were never in the stream. In addition to the inherited methods, it provides an additional unread() method, which allow you to ‘unread’ bytes. |
ObjectInputStream | Empowers you to perform serialization and deserialization of objects, while preserving their original characteristics. Additionally, it provides methods for reading primitives, strings, and deserializing objects from an input stream. |
FilterInputStream | Enables you to extend the functionalities of an InputStream to meet specific requirements. That is, it is designed to be extended, allowing you to customize and solve problems that are not yet covered by the existing API. |
DataInputStream | Lets you read strings and primitive data types from an InputStream and it is specifically designed for the efficient reading of such data types and does not handle object serialization. Therefore, it is not recommended for handling complex objects as it lacks built-in support, can be error-prone and verbose. |
5.2 Picking the Right Character Stream Class
Similarly, if you are not sure which Reader subclass to use, here is a brief summary of the ones covered here highlighting their main functionalities.
CHARACTER STREAM CLASS | WHAT IS IT GOOD FOR? |
---|---|
InputStreamReader | Enables the conversion of a byte stream into a character stream, allowing you to specify the appropriate encoding for proper reading. |
FileReader | The simplest way to read a text-based file as it provides no additional methods for reading character data. Note that it is usually slow compared to BufferedInputStream and only requires a File object to work. |
StringReader | Allows you to convert a String object stored in memory into a stream of characters. Note that you cannot specify any character encoding while using StringReader nor does it add any additional methods. |
CharArrayReader | Functions similar to StringReader, but instead of a String object, it takes a char array as input. Additionally, the CharArrayReader constructor allows you to specify a subset of the char array directly. |
FilterReader | Enables you to extend the functionalities of a Reader to meet specific requirements. That is, it is designed to be extended, allowing you to customize and solve problems that are not yet covered by the existing API. |
BufferedReader | An enhanced version of FileReader that provides improved file reading performance by wrapping around FileReader. That is, it offers a configurable buffer size for efficient caching, reduced system calls and faster reading. In addition to the inherited methods, it offers additional methods such as mark(), markSupported(), and reset(). |
LineNumberReader | A subclass of BufferedReader that adds the functionality of displaying line numbers during reading. However, it does not provide methods or setters to change the stream position. |
PushbackReader | Allows you to ‘push back’ characters that have been previously read or even insert characters that were never in the stream. In addition to the inherited methods, it provides an additional unread() method, which allow you to ‘unread’ characters. |
6. Conclusion
In conclusion, we hope that this article has provided you with a comprehensive overview of Java I/O API and its various classes for byte and character streams. By now, you should have a better understanding of how to use these classes to read data from different sources, handle text in different languages, and simplify your code using try-with-resources. Note that Java I/O API is an essential part of any Java developer’s toolkit. That is, mastering it will make you a more efficient and effective programmer. As is customary, check out our GitHub page for the source code used in this article and start exploring the possibilities of Java I/O today.
7. Sources
[1]: Java I/O Streams
[3]: Java™ The Complete Reference Twelfth Edition