Category: learn


When creating a Java web application that requires access restrictions on some pages, one way to do it is to include user authentication logic within each JSP, or perhaps in a JSP fragment that is included in all of these JSPs. In that code perform a redirect or forward to an access denied page.

req.getRequestDispatcher("/accessDenied.html").forward(request, response);

However, a Java web container is multi-threaded, and this merely spawns a new thread, and the server processes both the JSP and the page that has been forwarded to simultaneously. If one looks at the logging output from the server, or intercepts the HTTP traffic, using, for example, a packet sniffer (which are commonplace enough to exist as Firefox extensions), one can see this happening live. This method therefore, clearly raises some nasty security concerns.

Furthermore, from a software engineering or architectural design point of view, using a JSP as a filter violates MVC design principles, as doing so would make the JSP act as a controller as well as a view at the same time.

The proper way to go about implementing restricted access to certain pages is to use filters from the Servlet API, and then configuring the web application to map select pages to these filters. This method utilises additional capability provided by web container that runs on the server. The web container constructs filter chains based on the filter mappings defined. Most crucially, these filters allow the authentication logic and consequent branching to be executed prior to the JSP being called;

Create an abstract class `LoginFilter`.

    public abstract class LoginFilter implements javax.servlet.Filter {
        protected ServletContext servletContext
    
        public void init(FilterConfig filterConfig) {
            servletContext = filterConfig.getServletContext();
        }
    
        public void doFilter(
            ServletRequest request, ServletResponse response, FilterChain chain)
            throws IOException, ServletException {
            HttpServletRequest req = (HttpServletRequest)request;
            HttpServletResponse resp = (HttpServletResponse)response;
        
            if (!isAuth()) {
                resp.sendError(HttpServletResponse.SC_UNAUTHORIZED);
                return; //break filter chain, requested JSP/servlet will not be executed
            }
        
            //propagate to next element in the filter chain, ultimately JSP/ servlet gets executed
            chain.doFilter(request, response);        
        }
    
        /**
         * logic to accept or reject access to the page, check log in status
         * @return true when authentication is deemed valid
         */
        protected abstract boolean isAuth();
    
    }

Subclass the `LoginFilter` as a concrete class, `MemberLoginFilter`, by implementing the `isAuth` method. In this case add logic that determines whether a user is currently logged in. This post does not cover how this is done; in a nutshell, this would typically involve retrieving objects from the current HTTP session, and determining whether a user has logged in during the current session by comparing the values or states of these objects.

Now edit the `web.xml` configuration for the web application archive (WAR), to make the web app aware of the filter and what pages it should map to.

<filter>
    <description>Requires user to log in as a member</description>
    <filter-name>MemberLoginFilter</filter-name>
    <filter-class>some.package.MemberLoginFilter</filter-class>
</filter>
<filter-mapping>
    <filter-name>MemberLoginFilter</filter-name>
    <url-pattern>/someRestrictedPage.jsp</url-pattern>
</filter-mapping>

Add as many filter mappings as necessary for each page that needs to have access to it restricted to logged-in members only; use file name globs to match multiple files.

Also, still editing `web.xml` configuration, tell the web app how to handle access denied error messages in a more user friendly way, by adding the following node to the settings:

<error-page>
    <error-code>401</error-code>
    <location>/accessDenied.html</location>
</error-page>

The HTTP 401 error code referenced here corresponds to `HttpServletResponse.SC_UNAUTHORIZED` in `LoginFilter.doFilter(..)`.
This completes the loop: when the filter chain is broken because `LoginFilter.isAuth()` returns false, instead of getting a garish HTTP 401, the user will see an access denied page instead, which you can edit to inlcude a user friendly message.

If `LoginFilter.isAuth()` returns true, the filter chain continues execution, and assuming that there are no more filters in the filter chain, or all of the downstream filters in the filter chain pass, the requested JSP or servlet gets displayed.

Here are the resources I find most useful for Oracle SQL query optimisartion, ordered from basic to advanced.

Guide to understanding the basics of indexing and the various merges in line with their effects on execution plans

How-to manual from Oracle themselves on how to use autotrace in SQLPlus

The most complete reference on Oracle hints I’ve come across

A collection of specific, and advanced, query tuning techniques

Also, here is the 1979 seminal paper from P. Selinger of IBM, which forms the roots of access plan optimisation in most current day databases; and addresses the basic concepts that need to be taken into account when optimising queries.
Access path selection in a relational database management system


EDIT

This one here does not have anything to do directly with query optimisation, but I found that attempting to optimise queries without first understanding the algorithms behind them is a fool’s errand. So here is a great post on the basic algorithms behind nested loop join, merge join and hash join (code in C#).

In the previous posts, on socket programming and JDBC, the focus has been on programming for the server. This post will focus on client-side programming: applets and web start – with particular focus on the latter.
Continue reading

How to connect to a relational database from a Java program using Java Database Connectivity (JDBC); the JDBC API; some basic SQL; Object-Relational Mapping -all put together with a code example
Continue reading

Here is a quick introduction to socket programming in Java – using a simple demonstration of client-server communication. For the purposes of the demonstration, both the client and the server will be on the same physical machine (“localhost”). However, this can be easily changed later on.
Continue reading

How do we compare the relative performance of several data mining models? Previously, we discussed some basic model evaluation methods and metrics. Now we delve into more of them: ROC curves, Kappa statistic, mean square error, relative squared error, mean absolute error, and relative absolute error are the various metrics used, discussed below.
Continue reading

Building data mining models

In order to build the models used in data mining, one will need a set of data for training the learning algorithm, and then another set to evaluate the model built by the learning algorithm. Some basic evaluation methods and metrics are explored. Additionally, advanced techniques such as boosting and bagging may be applied to improve accuracy.
Continue reading

The transformation process

An operational database is transformed into a data warehouse through the following process:

Identify the facts – What are the facts that need to be obtained; the wanted results from the queries

Identify the dimensions – What are the dimensions (criteria) along which the facts should be filtered

Draw a star schema – Design a new database which consists of a central fact table which has a many-to-one relationship with each of the dimension tables

Identify the dimensions’ attributes – Each dimension should have more than one possible value; what are the characteristics (attributes) that make them different

SQL – Write the structured query language that will perform the actual transformation planned for above

Operational DB vs. Data warehouse

The operational database is the traditional relational database, with tables related to each other by use of foreign keys, and usually normalised until redundant data is minimised. This is useful when one wants to record “live” information such as transactions or logs. However, when one wants to mine the data for analytical purposes – for example, which time window of the day are the most sales made – and you want to do this over an extremely large data set – how should this be done? It is possible to construct a somewhat complex query to get the answer, but in cases such as these, the way in which the data warehouse is structured is optimised to obtain such information. However, before that is done, the operational data first needs to be transformed and put into a data warehouse.

Transformation SQL

I will not go into detail about the planning steps here, and instead will get right into the steps necessary to achieve the transformation.

1. Create the dimension tables

There are 3 ways to create dimension tables:

i) Copy a table wholesale from the operational database. This can be done for some tables, which fit the bill of the dimension tables – they usually have the same name as the identified dimension, plus the attributes identified for the dimension are the columns in this table in the operational database.

ii) Create tables from scratch, manually inserting values. This is done for dimensions which do not exist in the operational database at all. For example a time of day dimension for data that should be sliced according to morning, afternoon or night. In most operational databases, the tables will not have columns that have this information about a time/ date value (at least in normalised ones).

Create Table periodDim --create a period dimension table
(period_id Number,
period_desc Varchar(20));
Insert Into periodDim Values (1, 'Summer');
Insert Into periodDim Values (2, 'Autumn');
Insert Into periodDim Values (3, 'Winter');
Insert Into periodDim Values (4, 'Spring');

iii) Extract values from a operational database table using `Select Distinct`. This is done when the data exists in some form in the operational data, however, cannot be used as-is, and must be extracted in a special manner. This occurs in the converse situation as in (ii), where data could have been potentially further normalised, however values were used directly.

Create Table source_dim
As (Select Distinct
Dense_Rank() Over (Order By o.order_source) As source_id,
o.order_source As source_desc
From op_db.order o);

Note that here the `Dense_Rank() Over`, which is normally used for OLAP purposes, is used here to perform the simple function of assigning unique primary keys for this dimension table. Why `Dense_Rank` instead of `Rank`? You could really use either without issue, just that `Dense_Rank` will give running numbers without gaps, whereas, `Rank` would result in gaps, as a result of being used together with `Distinct`.

2. Create a temporary fact table

There are two main steps to this: First create the actual table by joining the necessary tables from the operational database. Secondly, create the links from the fact table to the dimension tables.

i) The first step is a simple join. Select all the attributes that are needed to obtain the attributes in the fact table. This would include the values that are needed to determine how it relates to each dimension, plus any of the “fact” attributes, which are usually aggregate values. “Aggregate” used here refers to a single figure that represents are larger set of figures: sum, count, average, minimum, maximum, etc.

ii) The next step is to add columns for the primary keys of the dimension tables. For example, the temp table may already have a “purchase date” column, and your dimension table has four entries, one for each season of the year. There are two ways to do this:

--manually insert the period IDs
Update temp Set period_id=1 --summer
Where (to_char(order_date, 'MON') In ('DEC', 'JAN', 'FEB'));
Update temp Set period_id=2 --autumn
Where (to_char(order_date, 'MON') In ('MAR', 'APR', 'MAY'));
Update temp Set period_id=3 --winter
Where (to_char(order_date, 'MON') In ('JUN', 'JUL', 'AUG'));
Update temp Set period_id=4 --spring
Where (to_char(order_date, 'MON') In ('SEP', 'OCT', 'NOV'));
--automatically parse period IDs
Update temp t
Set t.period_id=(
Select pd.period_id
From periodDim pd
Where ((to_date(to_char(t.order_date, 'MMDD'), 'MMDD') &lt;= pd.end_date) And
(to_date(to_char(t.order_date, 'MMDD'), 'MMDD') &gt;= pd.start_date)
);

Note that the latter involves an additional step where the “start_date” and “end_date” columns are defined for the period dimension table. In this case, since there are a finite number of seasons (four), there is a choice – either method will work fine. However, when there is a theoretically infinite number of possible entries in a dimension table, or if there is a finite but unmanageably large number of such entries, then the latter method is the only practical way to do this.

Also, note that I have converted a date type to a string, and then back again to a date type – this has been done on purpose to scrub the date values such that only their month and day values are used. Otherwise, you would be comparing apples to oranges.

3. Create the fact table from the template

This is the easiest step of them all (including the planning steps), because if everything has been done right up until now, then this is trivial. However, if something has not quite gone right in the previous steps, this is probably where you will notice and then have to backtrack. We just copy the fact table, but instead of select all columns, we select only the relevant ones (identified in the star schema). These should be the primary keys of the dimension tables, plus the aggregate values that are the facts being presented by the fact table.

Create Table fact
As (
Select Count(t.cust_id) As num_cust, --aggregate values
Count(t.order_id) As num_order,
Sum(t.item_price * t.order_qty) As order_cost,
t.source_id, t.period_id, t.locationi_d -- the PK's of the dimension table
From temp t
Group By t.source_id, t.period_id, t.location_id -- group by ALL of the non-aggregate values
);

A note on PKs and FKs

Primary keys and foreign keys are necessary in operational database because, in addition to them being normalised, they need to support all four types of operations (Create, Read, Update and Delete). Whereas, in a data warehouse, that is not the case – after creation, a data warehouse should only be used for Read operations.