Thursday 6 February 2014

File systems and hierarchy

Introduction

Hello. Welcome to my first post with GNU / Linux relevant information. During this "chapter" of my blog, I'm going to explain about file systems and the tree hierarchy on GNU / Linux systems. Feel free to comment or write me on my twitter for corrections. As a human, I can make mistakes and will make them. So feel free to send me a message with suggestions as well. I'll address them as soon as possible, but keep in mind this page is a hobby and not a job.

File System

Definition

Let's begin with the definition of File System I won't explain (at least, not in this post):
According to LPI Linux Certification in a Nutshell, 3rd Edition (O' Reilly), "it refers to the structure and contents of some storage medium. To view the contents of a filesystem (in this sense of the word) on a Linux system, the device must be mounted, or attached to the hierarchical directory structure on the system."

I'll explain about mounting on another post.

Now we shall move on to the concept I'll develop in the next paragraphs. According to Wikipedia:

"In computing, a file system (or filesystem) is used to control how data is stored and retrieved. Without a file system, information placed in a storage area would be one large body of data with no way to tell where one piece of information stops and the next begins. By separating the data into individual pieces, and giving each piece a name, the information is easily separated and identified."



Let's put it like this. Imagine your hard drive as a blank book with a very big and large page. You could not write in it several notes unless you split it in pages (or chapters).
Now, let's say your hard drive is a big note book with several pages, chapters, sections. We could say bits or bytes (1 bytes equals 8 bits) are the pages, the sectors or clusters are the chapters and the partitions are the sections. The whole book would be a physical hard disk or a Logical Volume with more than one hard disk.



That sends us to the definition of bit, byte, sector, cluster, partition and Logical Volume. I'll briefly explain what those are.

Glossary

Bit: Basically, it's a binary value (that is, either 0 or 1) that is interpreted by computer as data. The combination of 0s and 1s in a certain order could form, for example, a text file, a multimedia file, etc.

Byte: It is the grouping of 8 bits. This is usually, nowadays, the smallest amount of data we can speak of when talking about files.

Sector: it is a subdivision of a track on a magnetic disk or optical disc. Each sector stores a fixed amount of user-accessible data, traditionally 512 bytes. As I remember, the default sector when using partitioning tools such as fdisk is indeed 512 bytes.

Cluster: in this case we are talking about a data cluster (which is not the same as a cluster, group of computers). Based on Wikipedia' s first words on its article: "In computer file systems, a cluster or allocation unit is a unit of disk space allocation for files and directories. To reduce the overhead of managing on-disk data structures, the filesystem does not allocate individual disk sectors by default, but contiguous groups of sectors, called clusters.

On a disk that uses 512-byte sectors, a 512-byte cluster contains one sector, whereas a 4-kibibyte (KiB) cluster contains eight sectors.
A cluster is the smallest logical amount of disk space that can be allocated to hold a file."

And last, but not least, a partition is a certain amount of disk space that we (or the system installation procedure) chooses to allocate certain information (a mount point, backup files, etc.).

I' ll let you read more about Logical Volume Management here since I'm not that good at explaining it, but I have definitely implemented it in several opportunities. If you are experienced (or are not, but would feel adventureous and would like to give it a try) I shall encourage you to install your system using this feature as it might be easier to add space later in case you run out of it on a partition.

For more information, you can read Fedora 17's installation guide.

GNU / Linux Filesystem types

Even though there are a lot of different filesystems that can be used in Open Source Operating Systems based on GNU / Linux technologies, we commonly use the Extended File System (ext).


  • ext2 - Second Extended Filesystem is an established, mature GNU/Linux filesystem that is very stable. A drawback is that it does not have journaling support or barriers. Lack of journaling can result in data loss in the event of a power failure or system crash. It may also benot convenient for root (/) and /home partitions because file-system checks can take a long time. An ext2 filesystem can be converted to ext3.
  • ext3 - Third Extended Filesystem is essentially the ext2 system with journaling support and write barriers. It is backward compatible with ext2, well tested, and extremely stable.
  • ext4 - Fourth Extended Filesystem is a newer filesystem that is also compatible with ext2 and ext3. It provides support for volumes with sizes up to 1 exabyte (i.e. 1,048,576 terabytes) and files sizes up to 16 terabytes. It increases the 32,000 subdirectory limit in ext3 to 64,000. It also offers online defragmentation capability.

There is also another one that cannot be used for storage of files:
  • Swap - Filesystem used for swap partitions.

A note on journaling, since it is used by ext partitions. It is also from previously included link from Arch Linux's Wikipedia.



"Journaling provides fault-resilience by logging changes before they are committed to the filesystem. In the event of a system crash or power failure, such file systems are faster to bring back online and less likely to become corrupted. The logging takes place in a dedicated area of the filesystem.
Not all journaling techniques are the same. Only ext3 and ext4 offer data-mode journaling, which logs both data and meta-data. Data-mode journaling comes with a speed penalty and is not enabled by default. The other filesystems provide ordered-mode journaling, which only logs meta-data. While all journaling will return a filesystem to a valid state after a crash, data-mode journaling offers the greatest protection against corruption and data loss. There is a compromise in system performance, however, because data-mode journaling does two write operations: first to the journal and then to the disk. The trade-off between system speed and data safety should be considered when choosing the filesystem type."

For more information, I'd recommend you to follow Wikipedia links posted on the original article, which have been kept on the above 4 items as well.

Tree hierarchy

Definition

Ok, now let's follow this post with how all the above fits into a hard drive. Every GNU/Linux operating system (and others) tend to organise its files into folders. In this case, files are put together depending on its function. For example, configuration files can be usually found on /etc. However, I believe there are a few exceptions (correct me if I'm mistaken, since I cannot recall an example).

I'll try to explain what the main partitions keep inside:

  • /boot - The boot partition holds mainly specific files such as vmlinuz and others used for booting. On some systems it needs to have the bootable flag on or enabled, but I've recently read that this is not necessarily needed on GNU / Linux systems . If you are not sure if the flavour (Linux distribution) you are planning to install, keep the bootable flag on when doing so just in case.
  • swap - Swap partition. This partition does not carry any files arranged by its purpose, but is used when the system's RAM is running short. Also, when your computer enters in suspension / hibernation mode, files in use are stored inside it. You may also know it as virtual memory (especially if you're coming from a Windows environment).
  • /  - Root  (not to be confused with /root or root user).
  • /usr
  • /var - 2 GiB
  • /tmp - 500 MiB
  • /home –> As much as possible, remember this partition will hold the user's personal files.

I usually take the suggested partitioning scheme that is on Fedora Project Documentation (and many other sites / books / papers) as a reference for an 8GiB system installation:

  • /boot - 100 MiB
  • swap - 1 GiB
  • /  - 500 MiB
  • /usr - 4 GiB
  • /var - 2 GiB
  • /tmp - 500 MiB
  • /home –> As much as possible, remember this partition will hold the user's personal files. Unless, that is, users save their files somewhere else. However, that other partition (for example a Windows compatible filesystem) will not replace /home, which should exist as well if you are planning to divide the OS folders properly.

When partitioning and installing a system (check next post for further information), there are some directories that must be kept inside / (root partition) in order to allow a proper boot process completion.

  • /bin and /sbin as they contain some essential system programs that need to run at start up.
  • /dev as it includes information about the connected devices.
  • /etc holds configuration files, so it cannot be placed on a separate partition either.
  • /lib is used to allocate shared libraries and, some of them, are necessary during boot process. 

Next chapter

Ok, if you have read the full post, I thank you deeply. In case you have not, I will thank you anyway for entering my site. On the next post I will explain fdisk, mkfs, and I'll teach you how to install a GNU/Linux OS from scratch.

Once again, feel free to send me comments / suggestions / corrections.

Saturday 1 February 2014

Bienvenidos / Welcome

Español:
Hola. Les doy la bienvenida a mi blog. Por el momento y hasta que pueda levantar mi propio sitio web publicaré aquí artículos relacionados con el maravilloso mundo del simpático pingüino conocido como Tux y su entorno.

Estoy aprendiendo a usar apache y la página en cuestión va a ser mi pequeño proyecto de aprendizaje. Llegado el momento, trataré de subir el contenido de esta página al sitio web en cuestión.

Las distribuciones que utilizo por motivos de estudio son básicamente Debian y CentOS, así que trataré de publicar los artículos con los comandos para ambas definiciones. Por otro lado, asumiendo que los comandos están en inglés y demás, probablemente inicie el sitio en inglés y después haga la traducción al Español.

La bibliografía que utilizaré se basa en conocimientos obtenidos de foros, wikipedia, libros como Linux in a Nutshell y bibliografía de cursos a los que he asistido o estoy asistiendo, junto con notas personales.

English:

Hello. Welcome to my blog. For the time being and until I host my own website, I'll publish here articles related with the wonderful world of our friendly penguin known as Tux and its environment.

I'm currently learning to use apache and the website will be my tiny learning project. Once the time comes, I'll upload all the content to my site.

Distros I use for studying purposes are basically Debian and CentOS; so I'll try to include on the papers published commands for both definitions. Also, assuming commands are in English and information on the cloud is mainly found in that language, I'll postpone Spanish translation of it for later.

Material that I'll include on this blog is based in knowledge obtained in forums, Wikipedia, books such as Linux in a Nutshell and papers from seminars and courses I have taken and some others I'm currently attending, along with some personal notes.