'invoked'에 해당되는 글 4건

2007/11/29 Java 7 전망 (1)

2007/07/24 The Java Virtual Machine

2006/09/08 센드메일 장애 처리하기

2006/09/08 How to 'chroot' an Apache tree with Linux and Solaris

'invoked'에 해당되는 댓글 0건

Application_developing/Java 2007/11/29 11:43

Dolphin은 2007년에는 릴리스 되지 않을 것이다. 내년에 릴리스 될 가망성은 있다. 작업은 진행 중이고, 일부 기능들은 일찌감치 표준 확장으로서 데뷔를 마쳤고, 적어도 베타 버전으로 릴리스 되었다.

언어에 특성을 추가하는 것이 제거하는 것 보다 훨씬 더 쉽다는 것은 불행한 일이다. 시간이 흐를수록 언어가 점점 더 복잡해진다는 것은 피할 수 없는 일이다. 심지어 독립적인 상태로 좋은 기능들 조차도 서로서로 엮이면서 문제가 많은 기능이 되어버린다.

불행히도, 자바 커뮤니티는 아직 이러한 교훈을 터득하지 못한 것 같다. 이것은 매우 일반적인 사실인데도 말이다. 어떤 실질적 문제도 해결해주지 못하는데도 불구하고, 언어 디자이너들에겐 감당하기 힘들만큼 참신하고 흥분되는 신텍스들이 있을 수 있다. 따라서 클로저(closure), 다중 상속, 연산자 오버로딩(operator overloading)을 포함한 Java 7의 새로운 기능들에 대해 언급하고 있는 듯 하다.

실현 가능성이 반반 이겠지만, 올해 안에 Java 7 베타 버전에 클로저가 도입될 것이고, 연산자 오버로딩도 잘하면 볼 수 있을 것이다. 하지만 다중 상속은 불가능 할 것이다. 자바 대부분이 단일 상속 계층에 기반하고 있다. 다중 상속을 이 언어에 도입할 수 있는 그럴듯한 방법이 없다.

현재, 많은 신택스들이 있고, 이중 일부는 합당하지만, 어떤 것은 그렇지 못하다. 많은 제안들이 getFoo() 같은 메소드를 -> 연산자로 대체하는데 초점을 맞추고 있다.

List

집합체에 접근하는 첫 번째 가능한 방법은 배열관련 신텍스를 사용하는 것이다. 예를 들어, 다음과 같이 작성하는 대신,

List content = new LinkedList(10);
content.add(0, "Fred");
content.add(1, "Barney");
String name = content.get(0);

아래와 같이 작성한다:

List content = new LinkedList(10);
content[0] = "Fred";
content[1] = "Barney";
String name = content[0];

또 다른 방법으론 리스트에 대한 배열 초기화 문법을 사용할 수 있다:

LinkedList content = {"Fred", "Barney", "Wilma", "Betty"}

이 두 가지 제안 모두 가상 머신(VM)을 수정하지 않고 작은 컴파일러 조작으로도 구현될 수 있었다. 기존의 어떤 소스 코드도 무효로 하지 않았으며 재정의 하지 않았고, 심지어는 새로운 신택스에 보다 중요한 이슈가 되었다.

개발자의 생산성에서 차이를 만들어내는 한 가지 언어 특징은 XML과 SQL로 작업할 때 다루게 되는 테이블, 트리, 맵을 관리하는 내장된 원시타입이다. JavaScript의 E4X와 Microsoft의 Cω와 Linq 프로젝트는 내장된 원시타입을 이용한 개발 향상성에 기여하지만, 자바 플랫폼은 이러한 측면에 대구책이 아직은 없는 듯 하다.

프로퍼티

프로퍼티 액세스용 신택스를 보자. 한 가지 제안은 getFoo 와 setFoo 를 호출할 때 간단히 ->를 사용하는 것이다. 따라서, 다음과 같이 쓰는 대신,

Point p = new Point();
p.setX(56);
p.setY(87);
int z = p.getX();

아래와 같이 작성한다.

Point p = new Point();
p->X = 56;
p->Y = 87;
int z = p->X;

.과 #을 포함한 기타 심볼 역시 ->의 대안으로 제시되었다.

앞으로는, Point 클래스의 프로퍼티로서 관련 필드를 구분하거나, 구분하지 않을 수도 있다.

public class Point {

  public int property x;
  public int property y;

}

개인적으로, 전혀 흥미롭지 않다. 나는 자바 플랫폼이 보다 퍼블릭 필드를 실제로 사용할 때 Eiffel 계열의 방식을 채택하기를 바란다. 하지만, getter와 setter는 필드와 같은 이름으로 정의되면, 필드의 읽기와 쓰기는 자동으로 이 메소드로 보내진다. 더 적은 신택스를 사용하고 더 유연하다.

임의 정밀도 연산(Arbitrary precision arithmetic)

연산자 오버로딩

표준 수학 기호를 사용하는 것은 연산자 오버로딩과 같은 것이 아니다. 적어도 C++에서 문제를 일으키는 그러한 종류는 아닌 것이다. '+' 부호와 기타 연산자는 여전히 어떤 프로그램에서든 뚜렷한 의미를 갖고 있다. 이들은 의미를 프로그램마다 수정하지 않는다. 비슷한 연산에 같은 신택스를 사용한다면 코드는 더욱 읽기 쉬워진다. 따라서, 신택스 재정의는 다른 프로그램에서는 다른 것을 의미할 수 있기 때문에 코드 읽기가 더 어려워 질 수 있다.

메소드를 연산자로 대체하는 또 다른 제안은 BigDecimal과 BigInteger에도 해당된다. 예를 들어, 현재 무한 정밀도 연산(unlimited precision arithmetic)을 코딩 해야 한다면,

BigInteger low  = BigInteger.ONE;
BigInteger high = BigInteger.ONE;
for (int i = 0; i < 500; i++) {
  System.out.print(low);
  BigInteger temp = high;
  high = high.add(low);
  low = temp;
};

이것은 다음과 같이 명확히 작성될 수 있다.

BigInteger low  = 1;
BigInteger high = 1;
for (int i = 0; i < 500; i++) {
  System.out.print(low);
  BigInteger temp = high;
  high = high + low;
  low = temp;
};

클래스들의 남용과 이에 따른 퍼포먼스 저하가 있을 수 있지만 이 제안은 그렇게 거슬리지 않는다.

JAR에서 JAM 취하기

Java 7은 오랫동안 자바 개발자들을 짜증나게 했던 다양한 클래스 로더들(class loader)과 클래스 패스와 관련된 문제를 해결했다. Sun은 Java Module System을 사용하여 이러한 문제를 해결하고 있다. .jar 파일 대신, 데이터는 .jam 파일로 저장된다. 이것은 모든 코드와 메타데이터를 포함하고 있는 일종의 "superjar"라고 할 수 있다. 가장 중요한 것은, Java Module System은 처음으로 버전 관리를 지원하기 때문에, 해당 프로그램은 Xerces 2.6이 아니라 2.7.1버전을 필요로 한다고 지정할 수 있게 된다. 또한 종속 관계를 지정할 수 있다. 예를 들어, JAM의 프로그램은 JDOM을 필요로 한다고 지정할 수 있다. 모듈 전체를 로딩하지 않고 하나의 모듈을 로딩할 수도 있다. 마지막으로, 다수의 서로 다른 JAM들에 대한 다양한 버전들을 제공하는 중앙 집중화된 레파지토리로의 역할을 제공해 줄 것이고, 애플리케이션은 여기에서 제공하는 것들 중 필요한 것들을 선별할 수 있다. JMS가 작동하면 jre/lib/ext로의 지정 해야할 일련의 과정은 과거로 사라질 것이다.

패키지 액세스

Java 7은 아마도 액세스 제한을 조금 완화할 것이다. 이는 하위 패키지가 상위패키지에 있는 클래스들의 패키지상 보호되어있던 필드들과 메소드들에 접근할 수 있게 해줄 것이다. 또한, 하위 패키지가 상위 패키지의 페키지상 보호되었던 멤버들 중 명시적으로 접근 가능하도록 정의된 멤버들을 접근 할 수 있도록 해줄 것이다. 두 방법 모두, 애플리케이션을 여러 패키지들로 나누어서 보다 단순하고 테스트 성능도 향상시킨다. 단위 테스트가 하위 패키지에 있다면, 이들을 테스트 할 수 있도록 메소드를 공개할 필요도 없다.

파일시스템 액세스

파일시스템 액세스는 1995년부터 자바 플랫폼의 주요한 문제였다. 10년이 넘었지만, 파일을 복사하거나 이동하는 것 같은 기본적인 작동을 수행하는 신뢰성 있는 크로스 플랫폼 방식이 없다. 이러한 문제를 해결하는 것이 지난 세 개의 JDK 버전들(1.4, 5, 6)의 공공연한 이슈였다. 슬프게도 파일을 옮기거나 복사하는 지루하지만 필요한 API들이 메모리 매핑 I/O 같은 덜 일반적이지만 매력적인 연산에 밀려났다. 아마도 JSR 203은 이 문제를 해결할 것이고 우리에게 실현 가능한 크로스 플랫폼 파일 시스템 API를 제공할 것이다. 작업 그룹은 진정한 비동기식 I/O 문제에 대해 많은 관심을 쏟을 것이다. 내년 이맘때쯤 결과가 나올 것이다.

실험

어떤 변화가 일어나든 오픈 소스 세계에서 가장 먼저 구현된다면 더욱 좋을 것이다. 그래야지만 우리가 얼마나 많은 또는 얼마나 적은 차이를 만들어 냈는지를 볼 수 있기 때문이다. Sun의 Peter Ahe는 java.net에서 Kitchen Sink Project를 시작했다. 목표는 javac 컴파일러를 반복적으로 분기하여 다른 많은 아이디어를 테스트 하는 것이다.

위로

클라이언트 GUI

비록 많은 사람들이 알아채지는 못했지만, 자바 플랫폼은 4~5년 동안 데스크탑에 실제로 존재했다. RSSOwl, Limewire, Azureus, Eclipse, NetBeans, CyberDuck 등을 포함한 일부 양질의 데스크탑 애플리케이션들은 자바 코드로 작성되었다. 이러한 애플리케이션은 Swing, AWT, SWT 같은 거의 모든 GUI 툴킷과 심지어 Mac OS X의 Cocoa 같은 플랫폼 툴킷에 의해 작성된다. 이러한 툴킷 중에서 어떤 것이 더 낫다고 단정지을 수는 없지만, Swing은 원래의 느낌을 갖고 있는 애플리케이션을 생성하는데 다른 것 보다 더 나은 기능을 갖출 것이다.

Swing은 여전히 개발하기에는 도전이 되지만, Swing Application Framework의 출현으로 내년에는 상황이 더 나아질 것으로 보인다. 이 프레임웍은 현재 Java Community Process에서 JSR 296으로 개발 중이다. 다음은 JSR에 대한 설명이다.

잘 작성된 Swing 애플리케이션들은 시작시키고 종료시키는 작업과 리소스, 동작, 세션의 상태들을 관리하는데 있어 동일한 핵심요소들을 가지는 경향이 있다. 새로운 애플리케이션들은 이 모든 핵심 엘리먼트들을 처음부터 만들어낸다. Java SE는 어플리케이션들을 구성하는데 어떠한 지원도 제공해 주지 않고 있고, 바로 이것 때문에 개발자들은 SE 문서에서 설명하는 예제 그 이상으로 확장된 애플리케이션을 구현할 때 당황하게 된다.

Swing 애플리케이션의 기본 구조를 정의함으로써 그러한 허점을 채우기 위하여 정의된 사양서를 제공할 것이다. 이는 대부분의 데스크탑 애플리케이션의 일반적인 인프라스트럭처를 정의하는 확장성 있는 클래스 세트나 "프레임웍"을 정의할 것이다.

Swing Application Framework은 대부분의 전형적인 애플리케이션을 지원하면서, 개발자들로 하여금 시작과 종료와 같은 몇몇 커스터마이징 부분을 추가해 넣을 수 있도록 해줄 것이다. 이는 어플리케이션 시작과 종료 와중에 윈도우나 다른 부분들에 대한 저장과 복구를 다룰 수 있게 해줄 것이다. 마지막으로, 개발자들은 Swing 이벤트 디스패치 쓰레드 외부에서 실행되는 비동기식 액션을 작성할 수 있게 된다.

JavaBeans와 Swing을 향상시키고자 하는 노력이 진행 중이다. JSR 295는 빈들을 바인딩 하는 표준 방식을 정의하여, 하나의 빈이 업데이트 되면 이것이 자동으로 다른 빈에 반영될 수 있도록 한다. 예를 들어, GUI 그리드 빈은 연관된 데이터베이스 빈이 변경되면 자동으로 업데이트 된다.

마지막으로 JSR 303은 XML 기반 밸리데이션(validation) 언어 관련 작업을 진행하여 주어진 빈이 취할 값을 지정한다. int 프로퍼티는 1과 10 사이에 있어야 한다거나, String 프로퍼티에는 합법적인 이메일 주소가 포함되어야 한다는 등을 지정할 수 있다. 올해 말에는 베타 버전으로 이러한 기능을 사용할 수 있을 것이고 내년 이맘때쯤 Java 7에 대한 작업이 끝날 것이다.

위로

데스크탑 언어로서 자바 플랫폼

일부 프로그래머들은 자신들이 선호하는 언어이기 때문에 자바 코드로 데스크탑 애플리케이션을 작성하지만, 대부분은 멀티 플랫폼을 수용하기 위해서이다. 데스크탑 언어로서 자바 플랫폼에 대한 관심은 비 Microsoft 계열 데스크탑의 수와 연관되어있다. 세 가지 주요 데스크탑에서의 자바 프로그래밍에 대해 알아보자.

Windows

Swing은 특히 오픈 소스 개발로의 전향과 아울러 여전히 내년까지도 Windows 룩앤필을 향상시킬 것이다. 결국, LimeWire 같은 순수 자바 프로그램들은 Windows 보다 자연스러운 모습을 보이게 될 것이다. 하지만, 원래의 Windows 애플리케이션 개발 언어는 C#(일부 C 와 C++ 포함)이다. 그리고 프레임웍은 .NET이 될 것이다. 자바 코드는 Windows 생태계에 깊이 관여하지는 않을 것이다.

Macintosh

Microsoft와 마찬가지로 Apple Inc.도 자바 코드를 거의 포기했다. Apple은 Objective C 와Cocoa를 선호하지만 그의 결과 또한 같다. Mac 전용 개발자들은 Apple의 주력 언어와 환경에서 자바 코드를 제거하려는 시도를 계속해서 하고 있다.

긍정적인 측면은, Apple은 QuickTime과 Cocoa 같은 상용 API에 대해 자바 코드를 더 이상 지원하지 않지만 Apple VM은 그 어느 때보다도 훨씬 더 나아졌다. Apple의 Java 6 포트가 곧 릴리스 될 것이다. (Sun의 JDK와는 달리) 오픈 소스는 아니지만, 오픈 소스 프로그래머들이 버그를 픽스할 것이다.

Linux

GPL 라이센스로 자바 코드를 가장 순수한 오픈 소스 리눅스로 함께 묶을 수 있는데, 이로써 자바 플랫폼은 매력적인 리눅스 개발용 언어로서 한층 더 다가갈 수 있다. 이러한 상황이 5전 전에만 일어났다면, 리눅스 커뮤니티는 C 때문에 고생을 하지 않아도 되었을 것이고, Mono도 불필요했을 것이다.

Gnome과 KDE용 자바 바인딩은 내년에는 더욱 많은 관심을 끌 것이다. C, C++, 또는 C#이 아닌 자바 코드를 사용하여 리눅스 GUI 프로그램을 개발하는 적어도 한 개 이상의 주요 프로젝트가 시작될 것으로 보인다.

위로

Ruby가 경쟁에서 이기다!

Bloatware

JavaScript는 이미 JDK 6에 번들되었다. 추가 언어는 JDK 7에 추가될 것이다. 너무 부풀려지는 느낌이다. Sun이를 멈출 수 있는 방법이 없다. BeanShell을 선택하면 Groovy쪽 사람들도 개입하고 싶어할 것이다. Groovy가 들어오면, Ruby 사용자들도 들어오려고 할 것이다. Ruby가 개입되면 Python은 가만히 있을까? 표준 JDK는 이미 너무 방대해졌다. 여러 스크립팅 언어를 지원하는 것도 큰 일인데, 번들링까지는 무리이다. 해결책으로 이들 모두를 지원은 하겠지만, 어떤 것도 번들하지는 않을 것이다.

긍정적인 측면은, Sun은 초기의 다운로드 사이즈와 애플리케이션 시작 시간을 줄이는 방안을 모색하고 있다. 특히 애플릿과 Java Web Start 애플리케이션에 대해 그렇게 하고 있다. 거대한 클래스 라이브러리가 서버에 남겨지고, 필요한 부분만 다운로드 될 것이다.

우리가 단 하나의 언어로만 대화한다면 이 세상은 매우 지루할 것이다. 자바 플랫폼은 애플리케이션 개발에 있어서 최상의 선택이지만, 작은 프로그램이나 매크로에는 절대로 맞지 않는다. Java 6은 이를 인식하고 BeanShell, Python, Perl, Ruby, ECMAScript, Groovy 같은 스크립팅 언어와 통합하기 위해 javax.script 패키지를 추가하고, invokedynamic 가상 머신 명령을 Java VM에 추가하여 Java VM이 직접 동적으로 기술된 언어들을 컴파일 할 수 있도록 했다.

비록 내가 개인적으로 선호하는 것이 아니지만, 2007년에 나는 Ruby에 돈을 투자할 것이다. Python 코드는 훨씬 더 깨끗해진 것 같고, Ruby 코드보다 이해하기가 더 쉽다. 아마도 이 부분에 대해서는 대부분의 자바 프로그래머들도 동의할 것이다. Python은 시기를 잘못 타고 태어났다. 많은 개발자들은 Python을 배우는 것과 자바 코드를 배우는 것 중 선택해야 했고, 대부분이 자바 코드를 선택했다. 그들이 자바 신택스를 다 이해하고 다른 언어를 받아들일 준비가 되었을 때는 과거의 언어가 아닌 미래의 언어를 택하기 마련이고, 그것이 Ruby일 듯 하다. 가장 중요한 것은 Ruby는 Ruby on Rails의 절대적인 킬러 애플리케이션을 갖고 있다. 이것의 단순함은 현실적인 Java Enterprise Edition (JEE) 개발자들에게는 매우 매력적으로 비춰진다.

Rails 이상으로, JRuby 프로젝트는 다른 스크립팅 언어들 보다 기존 자바 코드와 라이브러리와 더 나은 통합을 시도한다. 사실, JRuby는 표준 Ruby를 누르고 Ruby를 사용하는 자바 프로그래머들뿐만 아니라 Ruby 프로그래머들도 선호하는 플랫폼이 되었다. 좋은 일이다. Ruby는 막강하나 Python은 그렇지 못하다는 것은, 슬프지만 진실이다.

다른 스크립팅 언어들은 점점 더 쇠락해 갈 것이다. Perl은 너무 구식이고 현대적인 애플리케이션과 잘 맞지 않는다. Groovy는 뚜렷한 비전이 없이 유용성이나 친숙성을 떠나 컴퓨터-과학 분야의 전문언어로 채택되는 경향이 있다. BeanShell, Jelly, 그리고 일부 언어들도 관리가 되고 있지 않다. Ruby가 자바 프로그래머의 스크립팅 언어가 될 것인지는 내년까지 승부가 갈릴 것이다.

위로

IDE가 점점 나아지고 있다.

2006년은 IDE에게 있어서 최악이었다. 이클립스에 당황한 Sun은 그들의 에너지와 리소스를 NetBeans에 쏟아 부었고, 결국엔 막강한 경쟁자가 되었다. 어떤 면에서, NetBeans는 2006년 말에는 이클립스를 능가하는 것처럼 보였다. 훨씬 더 나은 룩앤필과 GUI를 디자인 하는 훨씬 더 나은 툴을 갖고 있었다. 반면, NetBeans가 갖지 못한 것은 이클립스와 같이 커뮤니티가 활성화 되지 못했다는 것이다. 훨씬 더 많은 플러그인과 서드 파티 제품들이 NetBeans 보다는 이클립스에 기반하고 있다. 이러한 추세는 점점 더 가속화 될 전망이다.

이클립스는 3.3 버전을 위해 힘들게 작업 중이고, 2007년에는 릴리스 될 것이다. Sun은 올해 NetBeans 6를 출시할 것이다. 두 개 모두 메이저 릴리스가 될 것 같지는 않다. 단순히 여기 저기에 작은 기능들을 추가하고, 버그를 픽스하고, 사용자 인터페이스를 정리한 정도이다.

NetBeans는 계속해서 Eclipse의 영역에서 시장 점유율을 높여갈 것이며, 그의 성장 가능성또한 아주 많다. Sun의 집요한 JDK 다운로드에 NetBeans 끼워 넣기는 무리 없어 보인다. 올해 말까지 두 개의 IDE는 시장을 양분할 것이다.

한편, IntelliJ IDEA 사용자들은 이러한 모든 상황에 개의치 않고 자신들이 최상의 Java IDE를 사용하고 있다고 자부하고 있다. 하지만, 대부분의 사용자들은 $500이 넘는 가격을 부담스러워 할 것이고 시장 점유율은 5%에 그칠 것이다.

위로

Java Enterprise Edition

자바 프로그래밍의 어떤 부분도 JEE 만큼 성공적이면서도 욕을 먹은 부분은 없다. 이것은 모든 사람들이 애증을 갖고 있는 기술이다. 복잡하고, 혼란스럽고, 무겁다. 자바 프로그래밍의 어떤 부분도 이를 대체하기 위해 그렇게 많은 노력을 들인 적이 없다. Spring, Hibernate, Restlet, aspects, Struts ...등 끝이 없다. 그럼에도 불구하고, 거의 모든 자바 프로그래머들은 JEE를 찾고 있기 때문에 Sun은 이 부분을 신경을 써야 할 것이다.

엔터프라이즈 세계에서 내가 목격한 경향 중 하나는 단순함에 대한 열망이다. 많은 프레임웍들이 나와 있으며, 그 안에서 작고 단순함을 선호하는 추세다. 점점 더 많은 고객들은 JEE 스택의 큰 부분을 거부하고 있고 이러한 추세는 지속될 전망이다. 대신 고객들은 Spring 같은 단순한 프레임웍으로 전향하거나 Ruby on Rails를 사용하는 추세이다. 더 단순하고, 보다 이해하기 쉬운 시스템에 대한 열망은 서비스 지향 아키텍처(SOA)와 Representational State Transfer (REST)에 대한 관심을 증폭시키고 있다.

2007년에도 단순함에 대한 추세는 지속될 것이다. Rails에 탄력을 받은 많은 사람들은 Python (Turbo Gears), Groovy (Grails), Java (Sails) 같은 다른 언어에서도 성공하기 바란다. 이들 중 하나는 성공하겠지만, 그렇지 않을 경우 더 이상의 새로운 것은 없다. 결국, 비즈니스는 그들이 이미 갖고 있는 SOA, REST, Rails를 계속 사용할 것이다.

위로

Java Micro Edition

단순하고 작은 것에 대한 열망이 임베디드 분야에도 적용될 수 있을까? 지난해 동안 자바 플랫폼은 작은 장치에 있어서 큰 성공을 거두었고 2007년에도 그러한 동력을 계속 이어갈 전망이다. 우선, Mobile Information Device Profile (MIDP)의 버전 3이 기대된다. 특히, 여러 MIDlet을 하나의 VM에서 실행할 수 있게 될 것이고, 백그라운드 에서는 한 개 이상을 실행할 수 있다. 또한 암호화된 레코드 관리 시스템(RMS) 스토어와 IPv6 지원도 기대된다.

현재 개발 중인 Java ME용 Scalable 2D Vector Graphics API 2.0은 많은 장치에서 사용할 수 있는 애니메이션 기능을 확대하고 있다. SVG 애니메이션 외에도 오디오 및 비디오 스트리밍도 실행한다. 모바일 네트워크가 열리면, 정말로 중요해진다. 셀폰의 YouTube를 생각해 보라. (물론, 네트워크가 연결되지 않으면, 이것은 어떤 누구도 보고 싶지 않은 2인치 짜리 기업 광고일 뿐이다. 미국에서 이것이 실현될지 의문이지만, 유럽에서는 가능하리라고 본다.)

모바일 개발자들은 Java ME용 XML API를 지원하는 원년이 될 것이다. 이 API는 전화기의 제한된 메모리 환경에 맞도록 설계된 SAX, DOM, StAX, JAXP의 하위 세트이다. 많은 사람들은 순수한 XML은 전화기에는 맞지 않는다고 생각한다. 올해에는 이것이 옳은지, 그른지 판가름 날 것이다.

모든 좋은 소식에도 불구하고 Apple의 iPhone은 여전히 모바일 폰 개발 플랫폼으로서 자바 플랫폼에 위협적인 존재가 될 것이다. iPhone은 출시된 지 여섯 달 만에 최고의 주목을 받고 있다. 문제는 폐쇄 플랫폼이라는 것이고, 셀폰 네트워크 표준에 의해서도 자바 코드를 실행할 것 같지 않다. 이는 모바일 폰, PDA, 개인용 커뮤니케이터용 서드 파티 애플리케이션을 판매하려는 사람들에게 끔찍한 소식이 아닐 수 없다.

위로

요약

JDK의 오픈 소스 전향 덕택에, 2007년은 자바 프로그래밍 역사 이래 최고로 흥미로운 한 해가 될 것이다. 지금까지, 자바 플랫폼은 Sun의 목표나 투자로만 제한되었지만, 이러한 현상도 곧 바뀔 전망이다. 개발자 커뮤니티의 막강한 힘으로 자바 프로그래밍은 어디든 항해할 수 있다. 개발자들은 전보다 더 자바 코드로 더 많은 일들을 할 수 있다. 데스크탑, 서버, 임베디드 이 모든 것들이 가속화 될 것이다. 하지만 이러한 엔진들도 일시적으로 멈출 때가 있을 것이다. 좋은 것은 살아남고, 그렇지 않은 것은 사장될 것이다. 자바 플랫폼에서 좋아하지 않는 부분이나 여러분을 괴롭혔던 것이 있다면 여러분의 IDE를 띄워 해킹을 시작해 보는 것도 좋다.

여러분이 갖고 있는 컴파일러를 지금 시작해 보기 바란다.

소셜 북마크


	mar.gar.in

	Digg

	del.icio.us

	Slashdot

기사의 원문보기

Java 2007: The year in preview

이 글에는 트랙백을 보낼 수 없습니다

Application_developing/Java 2007/07/24 11:10

The Java Virtual Machine

The previous four chapters of this book gave a broad overview of Java's architecture. They showed how the Java virtual machine fits into the overall architecture relative to other components such as the language and API. The remainder of this book will focus more narrowly on the Java virtual machine. This chapter gives an overview of the Java virtual machine's internal architecture.

The Java virtual machine is called "virtual" because it is an abstract computer defined by a specification. To run a Java program, you need a concrete implementation of the abstract specification. This chapter describes primarily the abstract specification of the Java virtual machine. To illustrate the abstract definition of certain features, however, this chapter also discusses various ways in which those features could be implemented.

What is a Java Virtual Machine?

To understand the Java virtual machine you must first be aware that you may be talking about any of three different things when you say "Java virtual machine." You may be speaking of:

the abstract specification,
a concrete implementation, or
a runtime instance.

The abstract specification is a concept, described in detail in the book: The Java Virtual Machine Specification, by Tim Lindholm and Frank Yellin. Concrete implementations, which exist on many platforms and come from many vendors, are either all software or a combination of hardware and software. A runtime instance hosts a single running Java application.

Each Java application runs inside a runtime instance of some concrete implementation of the abstract specification of the Java virtual machine. In this book, the term "Java virtual machine" is used in all three of these senses. Where the intended sense is not clear from the context, one of the terms "specification," "implementation," or "instance" is added to the term "Java virtual machine".

The Lifetime of a Java Virtual Machine

A runtime instance of the Java virtual machine has a clear mission in life: to run one Java application. When a Java application starts, a runtime instance is born. When the application completes, the instance dies. If you start three Java applications at the same time, on the same computer, using the same concrete implementation, you'll get three Java virtual machine instances. Each Java application runs inside its own Java virtual machine.

A Java virtual machine instance starts running its solitary application by invoking the main() method of some initial class. The main() method must be public, static, return void, and accept one parameter: a String array. Any class with such a main() method can be used as the starting point for a Java application.

For example, consider an application that prints out its command line arguments:

// On CD-ROM in file jvm/ex1/Echo.java
class Echo {

    public static void main(String[] args) {
        int len = args.length;
        for (int i = 0; i < len; ++i) {
            System.out.print(args[i] + " ");
        }
        System.out.println();
    }
}

You must in some implementation-dependent way give a Java virtual machine the name of the initial class that has the main() method that will start the entire application. One real world example of a Java virtual machine implementation is the java program from Sun's Java 2 SDK. If you wanted to run the Echo application using Sun's java on Window98, for example, you would type in a command such as:

java Echo Greetings, Planet.

The first word in the command, "java," indicates that the Java virtual machine from Sun's Java 2 SDK should be run by the operating system. The second word, "Echo," is the name of the initial class. Echo must have a public static method named main() that returns void and takes a String array as its only parameter. The subsequent words, "Greetings, Planet.," are the command line arguments for the application. These are passed to the main() method in the String array in the order in which they appear on the command line. So, for the previous example, the contents of the String array passed to main in Echo are: arg[0] is "Greetings," arg[1] is "Planet."

The main() method of an application's initial class serves as the starting point for that application's initial thread. The initial thread can in turn fire off other threads.

Inside the Java virtual machine, threads come in two flavors: daemon and non- daemon. A daemon thread is ordinarily a thread used by the virtual machine itself, such as a thread that performs garbage collection. The application, however, can mark any threads it creates as daemon threads. The initial thread of an application--the one that begins at main()--is a non- daemon thread.

A Java application continues to execute (the virtual machine instance continues to live) as long as any non-daemon threads are still running. When all non-daemon threads of a Java application terminate, the virtual machine instance will exit. If permitted by the security manager, the application can also cause its own demise by invoking the exit() method of class Runtime or System.

In the Echo application previous, the main() method doesn't invoke any other threads. After it prints out the command line arguments, main() returns. This terminates the application's only non-daemon thread, which causes the virtual machine instance to exit.

The Architecture of the Java Virtual Machine

In the Java virtual machine specification, the behavior of a virtual machine instance is described in terms of subsystems, memory areas, data types, and instructions. These components describe an abstract inner architecture for the abstract Java virtual machine. The purpose of these components is not so much to dictate an inner architecture for implementations. It is more to provide a way to strictly define the external behavior of implementations. The specification defines the required behavior of any Java virtual machine implementation in terms of these abstract components and their interactions.

Figure 5-1 shows a block diagram of the Java virtual machine that includes the major subsystems and memory areas described in the specification. As mentioned in previous chapters, each Java virtual machine has a class loader subsystem: a mechanism for loading types (classes and interfaces) given fully qualified names. Each Java virtual machine also has an execution engine: a mechanism responsible for executing the instructions contained in the methods of loaded classes.

Figure 5-1. The internal architecture of the Java virtual machine.

When a Java virtual machine runs a program, it needs memory to store many things, including bytecodes and other information it extracts from loaded class files, objects the program instantiates, parameters to methods, return values, local variables, and intermediate results of computations. The Java virtual machine organizes the memory it needs to execute a program into several runtime data areas.

Although the same runtime data areas exist in some form in every Java virtual machine implementation, their specification is quite abstract. Many decisions about the structural details of the runtime data areas are left to the designers of individual implementations.

Different implementations of the virtual machine can have very different memory constraints. Some implementations may have a lot of memory in which to work, others may have very little. Some implementations may be able to take advantage of virtual memory, others may not. The abstract nature of the specification of the runtime data areas helps make it easier to implement the Java virtual machine on a wide variety of computers and devices.

Some runtime data areas are shared among all of an application's threads and others are unique to individual threads. Each instance of the Java virtual machine has one method area and one heap. These areas are shared by all threads running inside the virtual machine. When the virtual machine loads a class file, it parses information about a type from the binary data contained in the class file. It places this type information into the method area. As the program runs, the virtual machine places all objects the program instantiates onto the heap. See Figure 5-2 for a graphical depiction of these memory areas.

Figure 5-2. Runtime data areas shared among all threads.

As each new thread comes into existence, it gets its own pc register (program counter) and Java stack. If the thread is executing a Java method (not a native method), the value of the pc register indicates the next instruction to execute. A thread's Java stack stores the state of Java (not native) method invocations for the thread. The state of a Java method invocation includes its local variables, the parameters with which it was invoked, its return value (if any), and intermediate calculations. The state of native method invocations is stored in an implementation-dependent way in native method stacks, as well as possibly in registers or other implementation-dependent memory areas.

The Java stack is composed of stack frames (or frames). A stack frame contains the state of one Java method invocation. When a thread invokes a method, the Java virtual machine pushes a new frame onto that thread's Java stack. When the method completes, the virtual machine pops and discards the frame for that method.

The Java virtual machine has no registers to hold intermediate data values. The instruction set uses the Java stack for storage of intermediate data values. This approach was taken by Java's designers to keep the Java virtual machine's instruction set compact and to facilitate implementation on architectures with few or irregular general purpose registers. In addition, the stack-based architecture of the Java virtual machine's instruction set facilitates the code optimization work done by just-in-time and dynamic compilers that operate at run-time in some virtual machine implementations.

See Figure 5-3 for a graphical depiction of the memory areas the Java virtual machine creates for each thread. These areas are private to the owning thread. No thread can access the pc register or Java stack of another thread.

Figure 5-3. Runtime data areas exclusive to each thread.

Figure 5-3 shows a snapshot of a virtual machine instance in which three threads are executing. At the instant of the snapshot, threads one and two are executing Java methods. Thread three is executing a native method.

In Figure 5-3, as in all graphical depictions of the Java stack in this book, the stacks are shown growing downwards. The "top" of each stack is shown at the bottom of the figure. Stack frames for currently executing methods are shown in a lighter shade. For threads that are currently executing a Java method, the pc register indicates the next instruction to execute. In Figure 5-3, such pc registers (the ones for threads one and two) are shown in a lighter shade. Because thread three is currently executing a native method, the contents of its pc register--the one shown in dark gray--is undefined.

Data Types

The Java virtual machine computes by performing operations on certain types of data. Both the data types and operations are strictly defined by the Java virtual machine specification. The data types can be divided into a set of primitive types and a reference type. Variables of the primitive types hold primitive values, and variables of the reference type hold reference values. Reference values refer to objects, but are not objects themselves. Primitive values, by contrast, do not refer to anything. They are the actual data themselves. You can see a graphical depiction of the Java virtual machine's families of data types in Figure 5-4.

Figure 5-4. Data types of the Java virtual machine.

All the primitive types of the Java programming language are primitive types of the Java virtual machine. Although boolean qualifies as a primitive type of the Java virtual machine, the instruction set has very limited support for it. When a compiler translates Java source code into bytecodes, it uses ints or bytes to represent booleans. In the Java virtual machine, false is represented by integer zero and true by any non-zero integer. Operations involving boolean values use ints. Arrays of boolean are accessed as arrays of byte, though they may be represented on the heap as arrays of byte or as bit fields.

The primitive types of the Java programming language other than boolean form the numeric types of the Java virtual machine. The numeric types are divided between the integral types: byte, short, int, long, and char, and the floating- point types: float and double. As with the Java programming language, the primitive types of the Java virtual machine have the same range everywhere. A long in the Java virtual machine always acts like a 64-bit signed twos complement number, independent of the underlying host platform.

The Java virtual machine works with one other primitive type that is unavailable to the Java programmer: the returnAddress type. This primitive type is used to implement finally clauses of Java programs. The use of the returnAddress type is described in detail in Chapter 18, "Finally Clauses."

The reference type of the Java virtual machine is cleverly named reference. Values of type reference come in three flavors: the class type, the interface type, and the array type. All three types have values that are references to dynamically created objects. The class type's values are references to class instances. The array type's values are references to arrays, which are full-fledged objects in the Java virtual machine. The interface type's values are references to class instances that implement an interface. One other reference value is the null value, which indicates the reference variable doesn't refer to any object.

The Java virtual machine specification defines the range of values for each of the data types, but does not define their sizes. The number of bits used to store each data type value is a decision of the designers of individual implementations. The ranges of the Java virtual machines data type's are shown in Table 5-1. More information on the floating point ranges is given in Chapter 14, "Floating Point Arithmetic."

Type	Range
`byte`	8-bit signed two's complement integer (-2⁷ to 2⁷ - 1, inclusive)
`short`	16-bit signed two's complement integer (-2¹⁵ to 2¹⁵ - 1, inclusive)
`int`	32-bit signed two's complement integer (-2³¹ to 2³¹ - 1, inclusive)
`long`	64-bit signed two's complement integer (-2⁶³ to 2⁶³ - 1, inclusive)
`char`	16-bit unsigned Unicode character (0 to 2¹⁶ - 1, inclusive)
`float`	32-bit IEEE 754 single-precision float
`double`	64-bit IEEE 754 double-precision float
`returnAddress`	address of an opcode within the same method
`reference`	reference to an object on the heap, or `null`

Table 5-1. Ranges of the Java virtual machine's data types

Word Size

The basic unit of size for data values in the Java virtual machine is the word--a fixed size chosen by the designer of each Java virtual machine implementation. The word size must be large enough to hold a value of type byte, short, int, char, float, returnAddress, or reference. Two words must be large enough to hold a value of type long or double. An implementation designer must therefore choose a word size that is at least 32 bits, but otherwise can pick whatever word size will yield the most efficient implementation. The word size is often chosen to be the size of a native pointer on the host platform.

The specification of many of the Java virtual machine's runtime data areas are based upon this abstract concept of a word. For example, two sections of a Java stack frame--the local variables and operand stack-- are defined in terms of words. These areas can contain values of any of the virtual machine's data types. When placed into the local variables or operand stack, a value occupies either one or two words.

As they run, Java programs cannot determine the word size of their host virtual machine implementation. The word size does not affect the behavior of a program. It is only an internal attribute of a virtual machine implementation.

The Class Loader Subsystem

The part of a Java virtual machine implementation that takes care of finding and loading types is the class loader subsystem. Chapter 1, "Introduction to Java's Architecture," gives an overview of this subsystem. Chapter 3, "Security," shows how the subsystem fits into Java's security model. This chapter describes the class loader subsystem in more detail and show how it relates to the other components of the virtual machine's internal architecture.

As mentioned in Chapter 1, the Java virtual machine contains two kinds of class loaders: a bootstrap class loader and user-defined class loaders. The bootstrap class loader is a part of the virtual machine implementation, and user-defined class loaders are part of the running Java application. Classes loaded by different class loaders are placed into separate name spaces inside the Java virtual machine.

The class loader subsystem involves many other parts of the Java virtual machine and several classes from the java.lang library. For example, user-defined class loaders are regular Java objects whose class descends from java.lang.ClassLoader. The methods of class ClassLoader allow Java applications to access the virtual machine's class loading machinery. Also, for every type a Java virtual machine loads, it creates an instance of class java.lang.Class to represent that type. Like all objects, user-defined class loaders and instances of class Class reside on the heap. Data for loaded types resides in the method area.

Loading, Linking and Initialization

The class loader subsystem is responsible for more than just locating and importing the binary data for classes. It must also verify the correctness of imported classes, allocate and initialize memory for class variables, and assist in the resolution of symbolic references. These activities are performed in a strict order:

Loading: finding and importing the binary data for a type
Linking: performing verification, preparation, and (optionally) resolution
1. Verification: ensuring the correctness of the imported type
2. Preparation: allocating memory for class variables and initializing the memory to default values
3. Resolution: transforming symbolic references from the type into direct references.
Initialization: invoking Java code that initializes class variables to their proper starting values.

The details of these processes are given Chapter 7, "The Lifetime of a Type."

The Bootstrap Class Loader

Java virtual machine implementations must be able to recognize and load classes and interfaces stored in binary files that conform to the Java class file format. An implementation is free to recognize other binary forms besides class files, but it must recognize class files.

Every Java virtual machine implementation has a bootstrap class loader, which knows how to load trusted classes, including the classes of the Java API. The Java virtual machine specification doesn't define how the bootstrap loader should locate classes. That is another decision the specification leaves to implementation designers.

Given a fully qualified type name, the bootstrap class loader must in some way attempt to produce the data that defines the type. One common approach is demonstrated by the Java virtual machine implementation in Sun's 1.1 JDK on Windows98. This implementation searches a user-defined directory path stored in an environment variable named CLASSPATH. The bootstrap loader looks in each directory, in the order the directories appear in the CLASSPATH, until it finds a file with the appropriate name: the type's simple name plus ".class". Unless the type is part of the unnamed package, the bootstrap loader expects the file to be in a subdirectory of one the directories in the CLASSPATH. The path name of the subdirectory is built from the package name of the type. For example, if the bootstrap class loader is searching for class java.lang.Object, it will look for Object.class in the java\lang subdirectory of each CLASSPATH directory.

In 1.2, the bootstrap class loader of Sun's Java 2 SDK only looks in the directory in which the system classes (the class files of the Java API) were installed. The bootstrap class loader of the implementation of the Java virtual machine from Sun's Java 2 SDK does not look on the CLASSPATH. In Sun's Java 2 SDK virtual machine, searching the class path is the job of the system class loader, a user-defined class loader that is created automatically when the virtual machine starts up. More information on the class loading scheme of Sun's Java 2 SDK is given in Chapter 8, "The Linking Model."

User-Defined Class Loaders

Although user-defined class loaders themselves are part of the Java application, four of the methods in class ClassLoader are gateways into the Java virtual machine:

// Four of the methods declared in class java.lang.ClassLoader:
protected final Class defineClass(String name, byte data[],
    int offset, int length);
protected final Class defineClass(String name, byte data[],
    int offset, int length, ProtectionDomain protectionDomain);
protected final Class findSystemClass(String name);
protected final void resolveClass(Class c);

Any Java virtual machine implementation must take care to connect these methods of class ClassLoader to the internal class loader subsystem.

The two overloaded defineClass() methods accept a byte array, data[], as input. Starting at position offset in the array and continuing for length bytes, class ClassLoader expects binary data conforming to the Java class file format--binary data that represents a new type for the running application -- with the fully qualified name specified in name. The type is assigned to either a default protection domain, if the first version of defineClass() is used, or to the protection domain object referenced by the protectionDomain parameter. Every Java virtual machine implementation must make sure the defineClass() method of class ClassLoader can cause a new type to be imported into the method area.

The findSystemClass() method accepts a String representing a fully qualified name of a type. When a user-defined class loader invokes this method in version 1.0 and 1.1, it is requesting that the virtual machine attempt to load the named type via its bootstrap class loader. If the bootstrap class loader has already loaded or successfully loads the type, it returns a reference to the Class object representing the type. If it can't locate the binary data for the type, it throws ClassNotFoundException. In version 1.2, the findSystemClass() method attempts to load the requested type from the system class loader. Every Java virtual machine implementation must make sure the findSystemClass() method can invoke the bootstrap (if version 1.0 or 1.1) or system (if version 1.2 or later) class loader in this way.

The resolveClass() method accepts a reference to a Class instance. This method causes the type represented by the Class instance to be linked (if it hasn't already been linked). The defineClass() method, described previous, only takes care of loading. (See the previous section, "Loading, Linking, and Initialization" for definitions of these terms.) When defineClass() returns a Class instance, the binary file for the type has definitely been located and imported into the method area, but not necessarily linked and initialized. Java virtual machine implementations make sure the resolveClass() method of class ClassLoader can cause the class loader subsystem to perform linking.

The details of how a Java virtual machine performs class loading, linking, and initialization, with user- defined class loaders is given in Chapter 8, "The Linking Model."

Name Spaces

As mentioned in Chapter 3, "Security," each class loader maintains its own name space populated by the types it has loaded. Because each class loader has its own name space, a single Java application can load multiple types with the same fully qualified name. A type's fully qualified name, therefore, is not always enough to uniquely identify it inside a Java virtual machine instance. If multiple types of that same name have been loaded into different name spaces, the identity of the class loader that loaded the type (the identity of the name space it is in) will also be needed to uniquely identify that type.

Name spaces arise inside a Java virtual machine instance as a result of the process of resolution. As part of the data for each loaded type, the Java virtual machine keeps track of the class loader that imported the type. When the virtual machine needs to resolve a symbolic reference from one class to another, it requests the referenced class from the same class loader that loaded the referencing class. This process is described in detail in Chapter 8, "The Linking Model."

The Method Area

Inside a Java virtual machine instance, information about loaded types is stored in a logical area of memory called the method area. When the Java virtual machine loads a type, it uses a class loader to locate the appropriate class file. The class loader reads in the class file--a linear stream of binary data--and passes it to the virtual machine. The virtual machine extracts information about the type from the binary data and stores the information in the method area. Memory for class (static) variables declared in the class is also taken from the method area.

The manner in which a Java virtual machine implementation represents type information internally is a decision of the implementation designer. For example, multi-byte quantities in class files are stored in big- endian (most significant byte first) order. When the data is imported into the method area, however, a virtual machine can store the data in any manner. If an implementation sits on top of a little-endian processor, the designers may decide to store multi-byte values in the method area in little-endian order.

The virtual machine will search through and use the type information stored in the method area as it executes the application it is hosting. Designers must attempt to devise data structures that will facilitate speedy execution of the Java application, but must also think of compactness. If designing an implementation that will operate under low memory constraints, designers may decide to trade off some execution speed in favor of compactness. If designing an implementation that will run on a virtual memory system, on the other hand, designers may decide to store redundant information in the method area to facilitate execution speed. (If the underlying host doesn't offer virtual memory, but does offer a hard disk, designers could create their own virtual memory system as part of their implementation.) Designers can choose whatever data structures and organization they feel optimize their implementations performance, in the context of its requirements.

All threads share the same method area, so access to the method area's data structures must be designed to be thread-safe. If two threads are attempting to find a class named Lava, for example, and Lava has not yet been loaded, only one thread should be allowed to load it while the other one waits.

The size of the method area need not be fixed. As the Java application runs, the virtual machine can expand and contract the method area to fit the application's needs. Also, the memory of the method area need not be contiguous. It could be allocated on a heap--even on the virtual machine's own heap. Implementations may allow users or programmers to specify an initial size for the method area, as well as a maximum or minimum size.

The method area can also be garbage collected. Because Java programs can be dynamically extended via user-defined class loaders, classes can become "unreferenced" by the application. If a class becomes unreferenced, a Java virtual machine can unload the class (garbage collect it) to keep the memory occupied by the method area at a minimum. The unloading of classes--including the conditions under which a class can become "unreferenced"--is described in Chapter 7, "The Lifetime of a Type."

Type Information

For each type it loads, a Java virtual machine must store the following kinds of information in the method area:

The fully qualified name of the type
The fully qualified name of the type's direct superclass (unless the type is an interface or class java.lang.Object, neither of which have a superclass)
Whether or not the type is a class or an interface
The type's modifiers ( some subset of` public, abstract, final)
An ordered list of the fully qualified names of any direct superinterfaces

Inside the Java class file and Java virtual machine, type names are always stored as fully qualified names. In Java source code, a fully qualified name is the name of a type's package, plus a dot, plus the type's simple name. For example, the fully qualified name of class Object in package java.lang is java.lang.Object. In class files, the dots are replaced by slashes, as in java/lang/Object. In the method area, fully qualified names can be represented in whatever form and data structures a designer chooses.

In addition to the basic type information listed previously, the virtual machine must also store for each loaded type:

The constant pool for the type
Field information
Method information
All class (static) variables declared in the type, except constants
A reference to class ClassLoader
A reference to class Class

This data is described in the following sections.

The Constant Pool

For each type it loads, a Java virtual machine must store a constant pool. A constant pool is an ordered set of constants used by the type, including literals (string, integer, and floating point constants) and symbolic references to types, fields, and methods. Entries in the constant pool are referenced by index, much like the elements of an array. Because it holds symbolic references to all types, fields, and methods used by a type, the constant pool plays a central role in the dynamic linking of Java programs. The constant pool is described in more detail later in this chapter and in Chapter 6, "The Java Class File."

Field Information

For each field declared in the type, the following information must be stored in the method area. In addition to the information for each field, the order in which the fields are declared by the class or interface must also be recorded. Here's the list for fields:

The field's name
The field's type
The field's modifiers (some subset of public, private, protected, static, final, volatile, transient)

Method Information

For each method declared in the type, the following information must be stored in the method area. As with fields, the order in which the methods are declared by the class or interface must be recorded as well as the data. Here's the list:

The method's name
The method's return type (or void)
The number and types (in order) of the method's parameters
The method's modifiers (some subset of public, private, protected, static, final, synchronized, native, abstract)

In addition to the items listed previously, the following information must also be stored with each method that is not abstract or native:

The method's bytecodes
The sizes of the operand stack and local variables sections of the method's stack frame (these are described in a later section of this chapter)
An exception table (this is described in Chapter 17, "Exceptions")

Class Variables

Class variables are shared among all instances of a class and can be accessed even in the absence of any instance. These variables are associated with the class--not with instances of the class--so they are logically part of the class data in the method area. Before a Java virtual machine uses a class, it must allocate memory from the method area for each non-final class variable declared in the class.

Constants (class variables declared final) are not treated in the same way as non-final class variables. Every type that uses a final class variable gets a copy of the constant value in its own constant pool. As part of the constant pool, final class variables are stored in the method area--just like non-final class variables. But whereas non-final class variables are stored as part of the data for the type that declares them, final class variables are stored as part of the data for any type that uses them. This special treatment of constants is explained in more detail in Chapter 6, "The Java Class File."

A Reference to Class `ClassLoader`

For each type it loads, a Java virtual machine must keep track of whether or not the type was loaded via the bootstrap class loader or a user-defined class loader. For those types loaded via a user-defined class loader, the virtual machine must store a reference to the user-defined class loader that loaded the type. This information is stored as part of the type's data in the method area.

The virtual machine uses this information during dynamic linking. When one type refers to another type, the virtual machine requests the referenced type from the same class loader that loaded the referencing type. This process of dynamic linking is also central to the way the virtual machine forms separate name spaces. To be able to properly perform dynamic linking and maintain multiple name spaces, the virtual machine needs to know what class loader loaded each type in its method area. The details of dynamic linking and name spaces are given in Chapter 8, "The Linking Model."

A Reference to Class `Class`

An instance of class java.lang.Class is created by the Java virtual machine for every type it loads. The virtual machine must in some way associate a reference to the Class instance for a type with the type's data in the method area.

Your Java programs can obtain and use references to Class objects. One static method in class Class, allows you to get a reference to the Class instance for any loaded class:

// A method declared in class java.lang.Class:
public static Class forName(String className);

If you invoke forName("java.lang.Object"), for example, you will get a reference to the Class object that represents java.lang.Object. If you invoke forName("java.util.Enumeration"), you will get a reference to the Class object that represents the Enumeration interface from the java.util package. You can use forName() to get a Class reference for any loaded type from any package, so long as the type can be (or already has been) loaded into the current name space. If the virtual machine is unable to load the requested type into the current name space, forName() will throw ClassNotFoundException.

An alternative way to get a Class reference is to invoke getClass() on any object reference. This method is inherited by every object from class Object itself:

// A method declared in class java.lang.Object:
public final Class getClass();

If you have a reference to an object of class java.lang.Integer, for example, you could get the Class object for java.lang.Integer simply by invoking getClass() on your reference to the Integer object.

Given a reference to a Class object, you can find out information about the type by invoking methods declared in class Class. If you look at these methods, you will quickly realize that class Class gives the running application access to the information stored in the method area. Here are some of the methods declared in class Class:

// Some of the methods declared in class java.lang.Class:
public String getName();
public Class getSuperClass();
public boolean isInterface();
public Class[] getInterfaces();
public ClassLoader getClassLoader();

These methods just return information about a loaded type. getName() returns the fully qualified name of the type. getSuperClass() returns the Class instance for the type's direct superclass. If the type is class java.lang.Object or an interface, none of which have a superclass, getSuperClass() returns null. isInterface() returns true if the Class object describes an interface, false if it describes a class. getInterfaces() returns an array of Class objects, one for each direct superinterface. The superinterfaces appear in the array in the order they are declared as superinterfaces by the type. If the type has no direct superinterfaces, getInterfaces() returns an array of length zero. getClassLoader() returns a reference to the ClassLoader object that loaded this type, or null if the type was loaded by the bootstrap class loader. All this information comes straight out of the method area.

Method Tables

The type information stored in the method area must be organized to be quickly accessible. In addition to the raw type information listed previously, implementations may include other data structures that speed up access to the raw data. One example of such a data structure is a method table. For each non-abstract class a Java virtual machine loads, it could generate a method table and include it as part of the class information it stores in the method area. A method table is an array of direct references to all the instance methods that may be invoked on a class instance, including instance methods inherited from superclasses. (A method table isn't helpful in the case of abstract classes or interfaces, because the program will never instantiate these.) A method table allows a virtual machine to quickly locate an instance method invoked on an object. Method tables are described in detail in Chapter 8, "The Linking Model."

An Example of Method Area Use

As an example of how the Java virtual machine uses the information it stores in the method area, consider these classes:

// On CD-ROM in file jvm/ex2/Lava.java
class Lava {

    private int speed = 5; // 5 kilometers per hour

    void flow() {
    }
}

// On CD-ROM in file jvm/ex2/Volcano.java
class Volcano {

    public static void main(String[] args) {
        Lava lava = new Lava();
        lava.flow();
    }
}

The following paragraphs describe how an implementation might execute the first instruction in the bytecodes for the main() method of the Volcano application. Different implementations of the Java virtual machine can operate in very different ways. The following description illustrates one way--but not the only way--a Java virtual machine could execute the first instruction of Volcano's main() method.

To run the Volcano application, you give the name "Volcano" to a Java virtual machine in an implementation-dependent manner. Given the name Volcano, the virtual machine finds and reads in file Volcano.class. It extracts the definition of class Volcano from the binary data in the imported class file and places the information into the method area. The virtual machine then invokes the main() method, by interpreting the bytecodes stored in the method area. As the virtual machine executes main(), it maintains a pointer to the constant pool (a data structure in the method area) for the current class (class Volcano).

Note that this Java virtual machine has already begun to execute the bytecodes for main() in class Volcano even though it hasn't yet loaded class Lava. Like many (probably most) implementations of the Java virtual machine, this implementation doesn't wait until all classes used by the application are loaded before it begins executing main(). It loads classes only as it needs them.

main()'s first instruction tells the Java virtual machine to allocate enough memory for the class listed in constant pool entry one. The virtual machine uses its pointer into Volcano's constant pool to look up entry one and finds a symbolic reference to class Lava. It checks the method area to see if Lava has already been loaded.

The symbolic reference is just a string giving the class's fully qualified name: "Lava". Here you can see that the method area must be organized so a class can be located--as quickly as possible--given only the class's fully qualified name. Implementation designers can choose whatever algorithm and data structures best fit their needs--a hash table, a search tree, anything. This same mechanism can be used by the static forName() method of class Class, which returns a Class reference given a fully qualified name.

When the virtual machine discovers that it hasn't yet loaded a class named "Lava," it proceeds to find and read in file Lava.class. It extracts the definition of class Lava from the imported binary data and places the information into the method area.

The Java virtual machine then replaces the symbolic reference in Volcano's constant pool entry one, which is just the string "Lava", with a pointer to the class data for Lava. If the virtual machine ever has to use Volcano's constant pool entry one again, it won't have to go through the relatively slow process of searching through the method area for class Lava given only a symbolic reference, the string "Lava". It can just use the pointer to more quickly access the class data for Lava. This process of replacing symbolic references with direct references (in this case, a native pointer) is called constant pool resolution. The symbolic reference is resolved into a direct reference by searching through the method area until the referenced entity is found, loading new classes if necessary.

Finally, the virtual machine is ready to actually allocate memory for a new Lava object. Once again, the virtual machine consults the information stored in the method area. It uses the pointer (which was just put into Volcano's constant pool entry one) to the Lava data (which was just imported into the method area) to find out how much heap space is required by a Lava object.

A Java virtual machine can always determine the amount of memory required to represent an object by looking into the class data stored in the method area. The actual amount of heap space required by a particular object, however, is implementation-dependent. The internal representation of objects inside a Java virtual machine is another decision of implementation designers. Object representation is discussed in more detail later in this chapter.

Once the Java virtual machine has determined the amount of heap space required by a Lava object, it allocates that space on the heap and initializes the instance variable speed to zero, its default initial value. If class Lava's superclass, Object, has any instance variables, those are also initialized to default initial values. (The details of initialization of both classes and objects are given in Chapter 7, "The Lifetime of a Type.")

The first instruction of main() completes by pushing a reference to the new Lava object onto the stack. A later instruction will use the reference to invoke Java code that initializes the speed variable to its proper initial value, five. Another instruction will use the reference to invoke the flow() method on the referenced Lava object.

The Heap

Whenever a class instance or array is created in a running Java application, the memory for the new object is allocated from a single heap. As there is only one heap inside a Java virtual machine instance, all threads share it. Because a Java application runs inside its "own" exclusive Java virtual machine instance, there is a separate heap for every individual running application. There is no way two different Java applications could trample on each other's heap data. Two different threads of the same application, however, could trample on each other's heap data. This is why you must be concerned about proper synchronization of multi-threaded access to objects (heap data) in your Java programs.

The Java virtual machine has an instruction that allocates memory on the heap for a new object, but has no instruction for freeing that memory. Just as you can't explicitly free an object in Java source code, you can't explicitly free an object in Java bytecodes. The virtual machine itself is responsible for deciding whether and when to free memory occupied by objects that are no longer referenced by the running application. Usually, a Java virtual machine implementation uses a garbage collector to manage the heap.

Garbage Collection

A garbage collector's primary function is to automatically reclaim the memory used by objects that are no longer referenced by the running application. It may also move objects as the application runs to reduce heap fragmentation.

A garbage collector is not strictly required by the Java virtual machine specification. The specification only requires that an implementation manage its own heap in some manner. For example, an implementation could simply have a fixed amount of heap space available and throw an OutOfMemory exception when that space fills up. While this implementation may not win many prizes, it does qualify as a Java virtual machine. The Java virtual machine specification does not say how much memory an implementation must make available to running programs. It does not say how an implementation must manage its heap. It says to implementation designers only that the program will be allocating memory from the heap, but not freeing it. It is up to designers to figure out how they want to deal with that fact.

No garbage collection technique is dictated by the Java virtual machine specification. Designers can use whatever techniques seem most appropriate given their goals, constraints, and talents. Because references to objects can exist in many places--Java Stacks, the heap, the method area, native method stacks--the choice of garbage collection technique heavily influences the design of an implementation's runtime data areas. Various garbage collection techniques are described in Chapter 9, "Garbage Collection."

As with the method area, the memory that makes up the heap need not be contiguous, and may be expanded and contracted as the running program progresses. An implementation's method area could, in fact, be implemented on top of its heap. In other words, when a virtual machine needs memory for a freshly loaded class, it could take that memory from the same heap on which objects reside. The same garbage collector that frees memory occupied by unreferenced objects could take care of finding and freeing (unloading) unreferenced classes. Implementations may allow users or programmers to specify an initial size for the heap, as well as a maximum and minimum size.

Object Representation

The Java virtual machine specification is silent on how objects should be represented on the heap. Object representation--an integral aspect of the overall design of the heap and garbage collector--is a decision of implementation designers

The primary data that must in some way be represented for each object is the instance variables declared in the object's class and all its superclasses. Given an object reference, the virtual machine must be able to quickly locate the instance data for the object. In addition, there must be some way to access an object's class data (stored in the method area) given a reference to the object. For this reason, the memory allocated for an object usually includes some kind of pointer into the method area.

One possible heap design divides the heap into two parts: a handle pool and an object pool. An object reference is a native pointer to a handle pool entry. A handle pool entry has two components: a pointer to instance data in the object pool and a pointer to class data in the method area. The advantage of this scheme is that it makes it easy for the virtual machine to combat heap fragmentation. When the virtual machine moves an object in the object pool, it need only update one pointer with the object's new address: the relevant pointer in the handle pool. The disadvantage of this approach is that every access to an object's instance data requires dereferencing two pointers. This approach to object representation is shown graphically in Figure 5-5. This kind of heap is demonstrated interactively by the HeapOfFish applet, described in Chapter 9, "Garbage Collection."

Figure 5-5. Splitting an object across a handle pool and object pool.

Another design makes an object reference a native pointer to a bundle of data that contains the object's instance data and a pointer to the object's class data. This approach requires dereferencing only one pointer to access an object's instance data, but makes moving objects more complicated. When the virtual machine moves an object to combat fragmentation of this kind of heap, it must update every reference to that object anywhere in the runtime data areas. This approach to object representation is shown graphically in Figure 5-6.

Figure 5-6. Keeping object data all in one place.

The virtual machine needs to get from an object reference to that object's class data for several reasons. When a running program attempts to cast an object reference to another type, the virtual machine must check to see if the type being cast to is the actual class of the referenced object or one of its supertypes. . It must perform the same kind of check when a program performs an instanceof operation. In either case, the virtual machine must look into the class data of the referenced object. When a program invokes an instance method, the virtual machine must perform dynamic binding: it must choose the method to invoke based not on the type of the reference but on the class of the object. To do this, it must once again have access to the class data given only a reference to the object.

No matter what object representation an implementation uses, it is likely that a method table is close at hand for each object. Method tables, because they speed up the invocation of instance methods, can play an important role in achieving good overall performance for a virtual machine implementation. Method tables are not required by the Java virtual machine specification and may not exist in all implementations. Implementations that have extremely low memory requirements, for instance, may not be able to afford the extra memory space method tables occupy. If an implementation does use method tables, however, an object's method table will likely be quickly accessible given just a reference to the object.

One way an implementation could connect a method table to an object reference is shown graphically in Figure 5-7. This figure shows that the pointer kept with the instance data for each object points to a special structure. The special structure has two components:

A pointer to the full the class data for the object
The method table for the object The method table is an array of pointers to the data for each instance method that can be invoked on objects of that class. The method data pointed to by method table includes:
The sizes of the operand stack and local variables sections of the method's stack
The method's bytecodes
An exception table

This gives the virtual machine enough information to invoke the method. The method table include pointers to data for methods declared explicitly in the object's class or inherited from superclasses. In other words, the pointers in the method table may point to methods defined in the object's class or any of its superclasses. More information on method tables is given in Chapter 8, "The Linking Model."

Figure 5-7. Keeping the method table close at hand.

If you are familiar with the inner workings of C++, you may recognize the method table as similar to the VTBL or virtual table of C++ objects. In C++, objects are represented by their instance data plus an array of pointers to any virtual functions that can be invoked on the object. This approach could also be taken by a Java virtual machine implementation. An implementation could include a copy of the method table for a class as part of the heap image for every instance of that class. This approach would consume more heap space than the approach shown in Figure 5-7, but might yield slightly better performance on a systems that enjoy large quantities of available memory.

One other kind of data that is not shown in Figures 5-5 and 5-6, but which is logically part of an object's data on the heap, is the object's lock. Each object in a Java virtual machine is associated with a lock (or mutex) that a program can use to coordinate multi-threaded access to the object. Only one thread at a time can "own" an object's lock. While a particular thread owns a particular object's lock, only that thread can access that object's instance variables. All other threads that attempt to access the object's variables have to wait until the owning thread releases the object's lock. If a thread requests a lock that is already owned by another thread, the requesting thread has to wait until the owning thread releases the lock. Once a thread owns a lock, it can request the same lock again multiple times, but then has to release the lock the same number of times before it is made available to other threads. If a thread requests a lock three times, for example, that thread will continue to own the lock until it has released it three times.

Many objects will go through their entire lifetimes without ever being locked by a thread. The data required to implement an object's lock is not needed unless the lock is actually requested by a thread. As a result, many implementations, such as the ones shown in Figure 5-5 and 5-6, may not include a pointer to "lock data" within the object itself. Such implementations must create the necessary data to represent a lock when the lock is requested for the first time. In this scheme, the virtual machine must associate the lock with the object in some indirect way, such as by placing the lock data into a search tree based on the object's address.

Along with data that implements a lock, every Java object is logically associated with data that implements a wait set. Whereas locks help threads to work independently on shared data without interfering with one another, wait sets help threads to cooperate with one another--to work together towards a common goal.

Wait sets are used in conjunction with wait and notify methods. Every class inherits from Object three "wait methods" (overloaded forms of a method named wait()) and two "notify methods" (notify() and notifyAll()). When a thread invokes a wait method on an object, the Java virtual machine suspends that thread and adds it to that object's wait set. When a thread invokes a notify method on an object, the virtual machine will at some future time wake up one or more threads from that object's wait set. As with the data that implements an object's lock, the data that implements an object's wait set is not needed unless a wait or notify method is actually invoked on the object. As a result, many implementations of the Java virtual machine may keep the wait set data separate from the actual object data. Such implementations could allocate the data needed to represent an object's wait set when a wait or notify method is first invoked on that object by the running application. For more information about locks and wait sets, see Chapter 20, "Thread Synchronization."

One last example of a type of data that may be included as part of the image of an object on the heap is any data needed by the garbage collector. The garbage collector must in some way keep track of which objects are referenced by the program. This task invariably requires data to be kept for each object on the heap. The kind of data required depends upon the garbage collection technique being used. For example, if an implementation uses a mark and sweep algorithm, it must be able to mark an object as referenced or unreferenced. For each unreferenced object, it may also need to indicate whether or not the object's finalizer has been run. As with thread locks, this data may be kept separate from the object image. Some garbage collection techniques only require this extra data while the garbage collector is actually running. A mark and sweep algorithm, for instance, could potentially use a separate bitmap for marking referenced and unreferenced objects. More detail on various garbage collection techniques, and the data that is required by each of them, is given in Chapter 9, "Garbage Collection."

In addition to data that a garbage collector uses to distinguish between reference and unreferenced objects, a garbage collector needs data to keep track of which objects on which it has already executed a finalizer. Garbage collectors must run the finalizer of any object whose class declares one before it reclaims the memory occupied by that object. The Java language specification states that a garbage collector will only execute an object's finalizer once, but allows that finalizer to "resurrect" the object: to make the object referenced again. When the object becomes unreferenced for a second time, the garbage collector must not finalize it again. Because most objects will likely not have a finalizer, and very few of those will resurrect their objects, this scenario of garbage collecting the same object twice will probably be extremely rare. As a result, the data used to keep track of objects that have already been finalized, though logically part of the data associated with an object, will likely not be part of the object representation on the heap. In most cases, garbage collectors will keep this information in a separate place. Chapter 9, "Garbage Collection," gives more information about finalization.

Array Representation

In Java, arrays are full-fledged objects. Like objects, arrays are always stored on the heap. Also like objects, implementation designers can decide how they want to represent arrays on the heap.

Arrays have a Class instance associated with their class, just like any other object. All arrays of the same dimension and type have the same class. The length of an array (or the lengths of each dimension of a multidimensional array) does not play any role in establishing the array's class. For example, an array of three ints has the same class as an array of three hundred ints. The length of an array is considered part of its instance data.

The name of an array's class has one open square bracket for each dimension plus a letter or string representing the array's type. For example, the class name for an array of ints is "[I". The class name for a three-dimensional array of bytes is "[[[B". The class name for a two-dimensional array of Objects is "[[Ljava.lang.Object". The full details of this naming convention for array classes is given in Chapter 6, "The Java Class File."

Multi-dimensional arrays are represented as arrays of arrays. A two dimensional array of ints, for example, would be represented by a one dimensional array of references to several one dimensional arrays of ints. This is shown graphically in Figure 5-8.

Figure 5-8. One possible heap representation for arrays.

The data that must be kept on the heap for each array is the array's length, the array data, and some kind of reference to the array's class data. Given a reference to an array, the virtual machine must be able to determine the array's length, to get and set its elements by index (checking to make sure the array bounds are not exceeded), and to invoke any methods declared by Object, the direct superclass of all arrays.

The Program Counter

Each thread of a running program has its own pc register, or program counter, which is created when the thread is started. The pc register is one word in size, so it can hold both a native pointer and a returnAddress. As a thread executes a Java method, the pc register contains the address of the current instruction being executed by the thread. An "address" can be a native pointer or an offset from the beginning of a method's bytecodes. If a thread is executing a native method, the value of the pc register is undefined.

The Java Stack

When a new thread is launched, the Java virtual machine creates a new Java stack for the thread. As mentioned earlier, a Java stack stores a thread's state in discrete frames. The Java virtual machine only performs two operations directly on Java Stacks: it pushes and pops frames.

The method that is currently being executed by a thread is the thread's current method. The stack frame for the current method is the current frame. The class in which the current method is defined is called the current class, and the current class's constant pool is the current constant pool. As it executes a method, the Java virtual machine keeps track of the current class and current constant pool. When the virtual machine encounters instructions that operate on data stored in the stack frame, it performs those operations on the current frame.

When a thread invokes a Java method, the virtual machine creates and pushes a new frame onto the thread's Java stack. This new frame then becomes the current frame. As the method executes, it uses the frame to store parameters, local variables, intermediate computations, and other data.

A method can complete in either of two ways. If a method completes by returning, it is said to have normal completion. If it completes by throwing an exception, it is said to have abrupt completion. When a method completes, whether normally or abruptly, the Java virtual machine pops and discards the method's stack frame. The frame for the previous method then becomes the current frame.

All the data on a thread's Java stack is private to that thread. There is no way for a thread to access or alter the Java stack of another thread. Because of this, you need never worry about synchronizing multi- threaded access to local variables in your Java programs. When a thread invokes a method, the method's local variables are stored in a frame on the invoking thread's Java stack. Only one thread can ever access those local variables: the thread that invoked the method.

Like the method area and heap, the Java stack and stack frames need not be contiguous in memory. Frames could be allocated on a contiguous stack, or they could be allocated on a heap, or some combination of both. The actual data structures used to represent the Java stack and stack frames is a decision of implementation designers. Implementations may allow users or programmers to specify an initial size for Java stacks, as well as a maximum or minimum size.

The Stack Frame

The stack frame has three parts: local variables, operand stack, and frame data. The sizes of the local variables and operand stack, which are measured in words, depend upon the needs of each individual method. These sizes are determined at compile time and included in the class file data for each method. The size of the frame data is implementation dependent.

When the Java virtual machine invokes a Java method, it checks the class data to determine the number of words required by the method in the local variables and operand stack. It creates a stack frame of the proper size for the method and pushes it onto the Java stack.

Local Variables

The local variables section of the Java stack frame is organized as a zero-based array of words. Instructions that use a value from the local variables section provide an index into the zero-based array. Values of type int, float, reference, and returnAddress occupy one entry in the local variables array. Values of type byte, short, and char are converted to int before being stored into the local variables. Values of type long and double occupy two consecutive entries in the array.

To refer to a long or double in the local variables, instructions provide the index of the first of the two consecutive entries occupied by the value. For example, if a long occupies array entries three and four, instructions would refer to that long by index three. All values in the local variables are word-aligned. Dual-entry longs and doubles can start at any index.

The local variables section contains a method's parameters and local variables. Compilers place the parameters into the local variable array first, in the order in which they are declared. Figure 5-9 shows the local variables section for the following two methods:

// On CD-ROM in file jvm/ex3/Example3a.java
class Example3a {

    public static int runClassMethod(int i, long l, float f,
        double d, Object o, byte b) {

        return 0;
    }

    public int runInstanceMethod(char c, double d, short s,
        boolean b) {

        return 0;
    }
}

Figure 5-9. Method parameters on the local variables section of a Java stack.

Note that Figure 5-9 shows that the first parameter in the local variables for runInstanceMethod() is of type reference, even though no such parameter appears in the source code. This is the hidden this reference passed to every instance method. Instance methods use this reference to access the instance data of the object upon which they were invoked. As you can see by looking at the local variables for runClassMethod() in Figure 5-9, class methods do not receive a hidden this. Class methods are not invoked on objects. You can't directly access a class's instance variables from a class method, because there is no instance associated with the method invocation.

Note also that types byte, short, char, and boolean in the source code become ints in the local variables. This is also true of the operand stack. As mentioned earlier, the boolean type is not supported directly by the Java virtual machine. The Java compiler always uses ints to represent boolean values in the local variables or operand stack. Data types byte, short, and char, however, are supported directly by the Java virtual machine. These can be stored on the heap as instance variables or array elements, or in the method area as class variables. When placed into local variables or the operand stack, however, values of type byte, short, and char are converted into ints. They are manipulated as ints while on the stack frame, then converted back into byte, short, or char when stored back into heap or method area.

Also note that Object o is passed as a reference to runClassMethod(). In Java, all objects are passed by reference. As all objects are stored on the heap, you will never find an image of an object in the local variables or operand stack, only object references.

Aside from a method's parameters, which compilers must place into the local variables array first and in order of declaration, Java compilers can arrange the local variables array as they wish. Compilers can place the method's local variables into the array in any order, and they can use the same array entry for more than one local variable. For example, if two local variables have limited scopes that don't overlap, such as the i and j local variables in Example3b, compilers are free to use the same array entry for both variables. During the first half of the method, before j comes into scope, entry zero could be used for i. During the second half of the method, after i has gone out of scope, entry zero could be used for j.

// On CD-ROM in file jvm/ex3/Example3b.java
class Example3b {

    public static void runtwoLoops() {

        for (int i = 0; i < 10; ++i) {
            System.out.println(i);
        }

        for (int j = 9; j >= 0; --j) {
            System.out.println(j);
        }
    }
}

As with all the other runtime memory areas, implementation designers can use whatever data structures they deem most appropriate to represent the local variables. The Java virtual machine specification does not indicate how longs and doubles should be split across the two array entries they occupy. Implementations that use a word size of 64 bits could, for example, store the entire long or double in the lower of the two consecutive entries, leaving the higher entry unused.

Operand Stack

Like the local variables, the operand stack is organized as an array of words. But unlike the local variables, which are accessed via array indices, the operand stack is accessed by pushing and popping values. If an instruction pushes a value onto the operand stack, a later instruction can pop and use that value.

The virtual machine stores the same data types in the operand stack that it stores in the local variables: int, long, float, double, reference, and returnType. It converts values of type byte, short, and char to int before pushing them onto the operand stack.

Other than the program counter, which can't be directly accessed by instructions, the Java virtual machine has no registers. The Java virtual machine is stack-based rather than register-based because its instructions take their operands from the operand stack rather than from registers. Instructions can also take operands from other places, such as immediately following the opcode (the byte representing the instruction) in the bytecode stream, or from the constant pool. The Java virtual machine instruction set's main focus of attention, however, is the operand stack.

The Java virtual machine uses the operand stack as a work space. Many instructions pop values from the operand stack, operate on them, and push the result. For example, the iadd instruction adds two integers by popping two ints off the top of the operand stack, adding them, and pushing the int result. Here is how a Java virtual machine would add two local variables that contain ints and store the int result in a third local variable:

iload_0    // push the int in local variable 0
iload_1    // push the int in local variable 1
iadd       // pop two ints, add them, push result
istore_2   // pop int, store into local variable 2

In this sequence of bytecodes, the first two instructions, iload_0 and iload_1, push the ints stored in local variable positions zero and one onto the operand stack. The iadd instruction pops those two int values, adds them, and pushes the int result back onto the operand stack. The fourth instruction, istore_2, pops the result of the add off the top of the operand stack and stores it into local variable position two. In Figure 5-10, you can see a graphical depiction of the state of the local variables and operand stack while executing these instructions. In this figure, unused slots of the local variables and operand stack are left blank.

Figure 5-10. Adding two local variables.

Frame Data

In addition to the local variables and operand stack, the Java stack frame includes data to support constant pool resolution, normal method return, and exception dispatch. This data is stored in the frame data portion of the Java stack frame.

Many instructions in the Java virtual machine's instruction set refer to entries in the constant pool. Some instructions merely push constant values of type int, long, float, double, or String from the constant pool onto the operand stack. Some instructions use constant pool entries to refer to classes or arrays to instantiate, fields to access, or methods to invoke. Other instructions determine whether a particular object is a descendant of a particular class or interface specified by a constant pool entry.

Whenever the Java virtual machine encounters any of the instructions that refer to an entry in the constant pool, it uses the frame data's pointer to the constant pool to access that information. As mentioned earlier, references to types, fields, and methods in the constant pool are initially symbolic. When the virtual machine looks up a constant pool entry that refers to a class, interface, field, or method, that reference may still be symbolic. If so, the virtual machine must resolve the reference at that time.

Aside from constant pool resolution, the frame data must assist the virtual machine in processing a normal or abrupt method completion. If a method completes normally (by returning), the virtual machine must restore the stack frame of the invoking method. It must set the pc register to point to the instruction in the invoking method that follows the instruction that invoked the completing method. If the completing method returns a value, the virtual machine must push that value onto the operand stack of the invoking method.

The frame data must also contain some kind of reference to the method's exception table, which the virtual machine uses to process any exceptions thrown during the course of execution of the method. An exception table, which is described in detail in Chapter 17, "Exceptions," defines ranges within the bytecodes of a method that are protected by catch clauses. Each entry in an exception table gives a starting and ending position of the range protected by a catch clause, an index into the constant pool that gives the exception class being caught, and a starting position of the catch clause's code.

When a method throws an exception, the Java virtual machine uses the exception table referred to by the frame data to determine how to handle the exception. If the virtual machine finds a matching catch clause in the method's exception table, it transfers control to the beginning of that catch clause. If the virtual machine doesn't find a matching catch clause, the method completes abruptly. The virtual machine uses the information in the frame data to restore the invoking method's frame. It then rethrows the same exception in the context of the invoking method.

In addition to data to support constant pool resolution, normal method return, and exception dispatch, the stack frame may also include other information that is implementation dependent, such as data to support debugging.

Possible Implementations of the Java Stack

Implementation designers can represent the Java stack in whatever way they wish. As mentioned earlier, one potential way to implement the stack is by allocating each frame separately from a heap. As an example of this approach, consider the following class:

// On CD-ROM in file jvm/ex3/Example3c.java
class Example3c {

    public static void addAndPrint() {
        double result = addTwoTypes(1, 88.88);
        System.out.println(result);
    }

    public static double addTwoTypes(int i, double d) {
        return i + d;
    }
}

Figure 5-11 shows three snapshots of the Java stack for a thread that invokes the addAndPrint() method. In the implementation of the Java virtual machine represented in this figure, each frame is allocated separately from a heap. To invoke the addTwoTypes() method, the addAndPrint() method first pushes an int one and double 88.88 onto its operand stack. It then invokes the addTwoTypes() method.

Figure 5-11. Allocating frames from a heap.

The instruction to invoke addTwoTypes() refers to a constant pool entry. The Java virtual machine looks up the entry and resolves it if necessary.

Note that the addAndPrint() method uses the constant pool to identify the addTwoTypes() method, even though it is part of the same class. Like references to fields and methods of other classes, references to the fields and methods of the same class are initially symbolic and must be resolved before they are used.

The resolved constant pool entry points to information in the method area about the addTwoTypes() method. The virtual machine uses this information to determine the sizes required by addTwoTypes() for the local variables and operand stack. In the class file generated by Sun's javac compiler from the JDK 1.1, addTwoTypes() requires three words in the local variables and four words in the operand stack. (As mentioned earlier, the size of the frame data portion is implementation dependent.) The virtual machine allocates enough memory for the addTwoTypes() frame from a heap. It then pops the double and int parameters (88.88 and one) from addAndPrint()'s operand stack and places them into addTwoType()'s local variable slots one and zero.

When addTwoTypes() returns, it first pushes the double return value (in this case, 89.88) onto its operand stack. The virtual machine uses the information in the frame data to locate the stack frame of the invoking method, addAndPrint(). It pushes the double return value onto addAndPrint()'s operand stack and frees the memory occupied by addTwoType()'s frame. It makes addAndPrint()'s frame current and continues executing the addAndPrint() method at the first instruction past the addTwoType() method invocation.

Figure 5-12 shows snapshots of the Java stack of a different virtual machine implementation executing the same methods. Instead of allocating each frame separately from a heap, this implementation allocates frames from a contiguous stack. This approach allows the implementation to overlap the frames of adjacent methods. The portion of the invoking method's operand stack that contains the parameters to the invoked method become the base of the invoked method's local variables. In this example, addAndPrint()'s entire operand stack becomes addTwoType()'s entire local variables section.

Figure 5-12. Allocating frames from a contiguous stack.

This approach saves memory space because the same memory is used by the calling method to store the parameters as is used by the invoked method to access the parameters. It saves time because the Java virtual machine doesn't have to spend time copying the parameter values from one frame to another.

Note that the operand stack of the current frame is always at the "top" of the Java stack. Although this may be easier to visualize in the contiguous memory implementation of Figure 5-12, it is true no matter how the Java stack is implemented. (As mentioned earlier, in all the graphical images of the stack shown in this book, the stack grows downwards. The "top" of the stack is always shown at the bottom of the picture.) Instructions that push values onto (or pop values off of) the operand stack always operate on the current frame. Thus, pushing a value onto the operand stack can be seen as pushing a value onto the top of the entire Java stack. In the remainder of this book, "pushing a value onto the stack" refers to pushing a value onto the operand stack of the current frame.

One other possible approach to implementing the Java stack is a hybrid of the two approaches shown in Figure 5-11 and Figure 5-12. A Java virtual machine implementation can allocate a chunk of contiguous memory from a heap when a thread starts. In this memory, the virtual machine can use the overlapping frames approach shown in Figure 5-12. If the stack outgrows the contiguous memory, the virtual machine can allocate another chunk of contiguous memory from the heap. It can use the separate frames approach shown in Figure 5-11 to connect the invoking method's frame sitting in the old chunk with the invoked method's frame sitting in the new chunk. Within the new chunk, it can once again use the contiguous memory approach.

Native Method Stacks

In addition to all the runtime data areas defined by the Java virtual machine specification and described previously, a running Java application may use other data areas created by or for native methods. When a thread invokes a native method, it enters a new world in which the structures and security restrictions of the Java virtual machine no longer hamper its freedom. A native method can likely access the runtime data areas of the virtual machine (it depends upon the native method interface), but can also do anything else it wants. It may use registers inside the native processor, allocate memory on any number of native heaps, or use any kind of stack.

Native methods are inherently implementation dependent. Implementation designers are free to decide what mechanisms they will use to enable a Java application running on their implementation to invoke native methods.

Any native method interface will use some kind of native method stack. When a thread invokes a Java method, the virtual machine creates a new frame and pushes it onto the Java stack. When a thread invokes a native method, however, that thread leaves the Java stack behind. Instead of pushing a new frame onto the thread's Java stack, the Java virtual machine will simply dynamically link to and directly invoke the native method. One way to think of it is that the Java virtual machine is dynamically extending itself with native code. It is as if the Java virtual machine implementation is just calling another (dynamically linked) method within itself, at the behest of the running Java program.

If an implementation's native method interface uses a C-linkage model, then the native method stacks are C stacks. When a C program invokes a C function, the stack operates in a certain way. The arguments to the function are pushed onto the stack in a certain order. The return value is passed back to the invoking function in a certain way. This would be the behavior of the of native method stacks in that implementation.

A native method interface will likely (once again, it is up to the designers to decide) be able to call back into the Java virtual machine and invoke a Java method. In this case, the thread leaves the native method stack and enters another Java stack.

Figure 5-13 shows a graphical depiction of a thread that invokes a native method that calls back into the virtual machine to invoke another Java method. This figure shows the full picture of what a thread can expect inside the Java virtual machine. A thread may spend its entire lifetime executing Java methods, working with frames on its Java stack. Or, it may jump back and forth between the Java stack and native method stacks.

Figure 5-13. The stack for a thread that invokes Java and native methods.

As depicted in Figure 5-13, a thread first invoked two Java methods, the second of which invoked a native method. This act caused the virtual machine to use a native method stack. In this figure, the native method stack is shown as a finite amount of contiguous memory space. Assume it is a C stack. The stack area used by each C-linkage function is shown in gray and bounded by a dashed line. The first C-linkage function, which was invoked as a native method, invoked another C-linkage function. The second C-linkage function invoked a Java method through the native method interface. This Java method invoked another Java method, which is the current method shown in the figure.

As with the other runtime memory areas, the memory they occupied by native method stacks need not be of a fixed size. It can expand and contract as needed by the running application. Implementations may allow users or programmers to specify an initial size for the method area, as well as a maximum or minimum size.

Execution Engine

At the core of any Java virtual machine implementation is its execution engine. In the Java virtual machine specification, the behavior of the execution engine is defined in terms of an instruction set. For each instruction, the specification describes in detail what an implementation should do when it encounters the instruction as it executes bytecodes, but says very little about how. As mentioned in previous chapters, implementation designers are free to decide how their implementations will execute bytecodes. Their implementations can interpret, just-in-time compile, execute natively in silicon, use a combination of these, or dream up some brand new technique.

Similar to the three senses of the term "Java virtual machine" described at the beginning of this chapter, the term "execution engine" can also be used in any of three senses: an abstract specification, a concrete implementation, or a runtime instance. The abstract specification defines the behavior of an execution engine in terms of the instruction set. Concrete implementations, which may use a variety of techniques, are either software, hardware, or a combination of both. A runtime instance of an execution engine is a thread.

Each thread of a running Java application is a distinct instance of the virtual machine's execution engine. From the beginning of its lifetime to the end, a thread is either executing bytecodes or native methods. A thread may execute bytecodes directly, by interpreting or executing natively in silicon, or indirectly, by just- in-time compiling and executing the resulting native code. A Java virtual machine implementation may use other threads invisible to the running application, such as a thread that performs garbage collection. Such threads need not be "instances" of the implementation's execution engine. All threads that belong to the running application, however, are execution engines in action.

The Instruction Set

A method's bytecode stream is a sequence of instructions for the Java virtual machine. Each instruction consists of a one-byte opcode followed by zero or more operands. The opcode indicates the operation to be performed. Operands supply extra information needed by the Java virtual machine to perform the operation specified by the opcode. The opcode itself indicates whether or not it is followed by operands, and the form the operands (if any) take. Many Java virtual machine instructions take no operands, and therefore consist only of an opcode. Depending upon the opcode, the virtual machine may refer to data stored in other areas in addition to (or instead of) operands that trail the opcode. When it executes an instruction, the virtual machine may use entries in the current constant pool, entries in the current frame's local variables, or values sitting on the top of the current frame's operand stack.

The abstract execution engine runs by executing bytecodes one instruction at a time. This process takes place for each thread (execution engine instance) of the application running in the Java virtual machine. An execution engine fetches an opcode and, if that opcode has operands, fetches the operands. It executes the action requested by the opcode and its operands, then fetches another opcode. Execution of bytecodes continues until a thread completes either by returning from its starting method or by not catching a thrown exception.

From time to time, the execution engine may encounter an instruction that requests a native method invocation. On such occasions, the execution engine will dutifully attempt to invoke that native method. When the native method returns (if it completes normally, not by throwing an exception), the execution engine will continue executing the next instruction in the bytecode stream.

One way to think of native methods, therefore, is as programmer-customized extensions to the Java virtual machine's instruction set. If an instruction requests an invocation of a native method, the execution engine invokes the native method. Running the native method is how the Java virtual machine executes the instruction. When the native method returns, the virtual machine moves on to the next instruction. If the native method completes abruptly (by throwing an exception), the virtual machine follows the same steps to handle the exception as it does when any instruction throws an exception.

Part of the job of executing an instruction is determining the next instruction to execute. An execution engine determines the next opcode to fetch in one of three ways. For many instructions, the next opcode to execute directly follows the current opcode and its operands, if any, in the bytecode stream. For some instructions, such as goto or return, the execution engine determines the next opcode as part of its execution of the current instruction. If an instruction throws an exception, the execution engine determines the next opcode to fetch by searching for an appropriate catch clause.

Several instructions can throw exceptions. The athrow instruction, for example, throws an exception explicitly. This instruction is the compiled form of the throw statement in Java source code. Every time the athrow instruction is executed, it will throw an exception. Other instructions throw exceptions only when certain conditions are encountered. For example, if the Java virtual machine discovers, to its chagrin, that the program is attempting to perform an integer divide by zero, it will throw an ArithmeticException. This can occur while executing any of four instructions--idiv, ldiv, irem, and lrem--which perform divisions or calculate remainders on ints or longs.

Each type of opcode in the Java virtual machine's instruction set has a mnemonic. In the typical assembly language style, streams of Java bytecodes can be represented by their mnemonics followed by (optional) operand values.

For an example of method's bytecode stream and mnemonics, consider the doMathForever() method of this class:

// On CD-ROM in file jvm/ex4/Act.java
class Act {

    public static void doMathForever() {
        int i = 0;
        for (;;) {
            i += 1;
            i *= 2;
        }
    }
}

The stream of bytecodes for doMathForever() can be disassembled into mnemonics as shown next. The Java virtual machine specification does not define any official syntax for representing the mnemonics of a method's bytecodes. The code shown next illustrates the manner in which streams of bytecode mnemonics will be represented in this book. The left hand column shows the offset in bytes from the beginning of the method's bytecodes to the start of each instruction. The center column shows the instruction and any operands. The right hand column contains comments, which are preceded with a double slash, just as in Java source code.

// Bytecode stream: 03 3b 84 00 01 1a 05 68 3b a7 ff f9
// Disassembly:
// Method void doMathForever()
// Left column: offset of instruction from beginning of method
// |   Center column: instruction mnemonic and any operands
// |   |                   Right column: comment
   0   iconst_0           // 03
   1   istore_0           // 3b
   2   iinc 0, 1          // 84 00 01
   5   iload_0            // 1a
   6   iconst_2           // 05
   7   imul               // 68
   8   istore_0           // 3b
   9   goto 2             // a7 ff f9

This way of representing mnemonics is very similar to the output of the javap program of Sun's Java 2 SDK. javap allows you to look at the bytecode mnemonics of the methods of any class file. Note that jump addresses are given as offsets from the beginning of the method. The goto instruction causes the virtual machine to jump to the instruction at offset two (an iinc). The actual operand in the stream is minus seven. To execute this instruction, the virtual machine adds the operand to the current contents of the pc register. The result is the address of the iinc instruction at offset two. To make the mnemonics easier to read, the operands for jump instructions are shown as if the addition has already taken place. Instead of saying "goto -7," the mnemonics say, "goto 2."

The central focus of the Java virtual machine's instruction set is the operand stack. Values are generally pushed onto the operand stack before they are used. Although the Java virtual machine has no registers for storing arbitrary values, each method has a set of local variables. The instruction set treats the local variables, in effect, as a set of registers that are referred to by indexes. Nevertheless, other than the iinc instruction, which increments a local variable directly, values stored in the local variables must be moved to the operand stack before being used.

For example, to divide one local variable by another, the virtual machine must push both onto the stack, perform the division, and then store the result back into the local variables. To move the value of an array element or object field into a local variable, the virtual machine must first push the value onto the stack, then store it into the local variable. To set an array element or object field to a value stored in a local variable, the virtual machine must follow the reverse procedure. First, it must push the value of the local variable onto the stack, then pop it off the stack and into the array element or object field on the heap.

Several goals--some conflicting--guided the design of the Java virtual machine's instruction set. These goals are basically the same as those described in Part I of this book as the motivation behind Java's entire architecture: platform independence, network mobility, and security.

The platform independence goal was a major influence in the design of the instruction set. The instruction set's stack-centered approach, described previously, was chosen over a register-centered approach to facilitate efficient implementation on architectures with few or irregular registers, such as the Intel 80X86. This feature of the instruction set--the stack-centered design--make it easier to implement the Java virtual machine on a wide variety of host architectures.

Another motivation for Java's stack-centered instruction set is that compilers usually use a stack-based architecture to pass an intermediate compiled form or the compiled program to a linker/optimizer. The Java class file, which is in many ways similar to the UNIX .o or Windows .obj file emitted by a C compiler, really represents an intermediate compiled form of a Java program. In the case of Java, the virtual machine serves as (dynamic) linker and may serve as optimizer. The stack-centered architecture of the Java virtual machine's instruction set facilitates the optimization that may be performed at run-time in conjunction with execution engines that perform just-in-time compiling or adaptive optimization.

As mentioned in Chapter 4, "Network Mobility," one major design consideration was class file compactness. Compactness is important because it facilitates speedy transmission of class files across networks. In the bytecodes stored in class files, all instructions--except two that deal with table jumping--are aligned on byte boundaries. The total number of opcodes is small enough so that opcodes occupy only one byte. This design strategy favors class file compactness possibly at the cost of some performance when the program runs. In some Java virtual machine implementations, especially those executing bytecodes in silicon, the single-byte opcode may preclude certain optimizations that could improve performance. Also, better performance may have been possible on some implementations if the bytecode streams were word-aligned instead of byte-aligned. (An implementation could always realign bytecode streams, or translate opcodes into a more efficient form as classes are loaded. Bytecodes are byte-aligned in the class file and in the specification of the abstract method area and execution engine. Concrete implementations can store the loaded bytecode streams any way they wish.)

Another goal that guided the design of the instruction set was the ability to do bytecode verification, especially all at once by a data flow analyzer. The verification capability is needed as part of Java's security framework. The ability to use a data flow analyzer on the bytecodes when they are loaded, rather than verifying each instruction as it is executed, facilitates execution speed. One way this design goal manifests itself in the instruction set is that most opcodes indicate the type they operate on.

For example, instead of simply having one instruction that pops a word from the operand stack and stores it in a local variable, the Java virtual machine's instruction set has two. One instruction, istore, pops and stores an int. The other instruction, fstore, pops and stores a float. Both of these instructions perform the exact same function when executed: they pop a word and store it. Distinguishing between popping and storing an int versus a float is important only to the verification process.

For many instructions, the virtual machine needs to know the types being operated on to know how to perform the operation. For example, the Java virtual machine supports two ways of adding two words together, yielding a one-word result. One addition treats the words as ints, the other as floats. The difference between these two instructions facilitates verification, but also tells the virtual machine whether it should perform integer or floating point arithmetic.

A few instructions operate on any type. The dup instruction, for example, duplicates the top word of a stack irrespective of its type. Some instructions, such as goto, don't operate on typed values. The majority of the instructions, however, operate on a specific type. The mnemonics for most of these "typed" instructions indicate their type by a single character prefix that starts their mnemonic. Table 5-2 shows the prefixes for the various types. A few instructions, such as arraylength or instanceof, don't include a prefix because their type is obvious. The arraylength opcode requires an array reference. The instanceof opcode requires an object reference.

Type	Code	Example	Description
`byte`	`b`	`baload`	load `byte` from array
`short`	`s`	`saload`	load `short` from array
`int`	`i`	`iaload`	load `int` from array
`long`	`l`	`laload`	load `long` from array
`char`	`c`	`caload`	load `char` from array
`float`	`f`	`faload`	load `float` from array
`double`	`d`	`daload`	load `double` from array
`reference`	`a`	`aaload`	load `reference` from array

Table 5-2. Type prefixes of bytecode mnemonics

Values on the operand stack must be used in a manner appropriate to their type. It is illegal, for example, to push four ints, then add them as if they were two longs. It is illegal to push a float value onto the operand stack from the local variables, then store it as an int in an array on the heap. It is illegal to push a double value from an object field on the heap, then store the topmost of its two words into the local variables as an value of type reference. The strict type rules that are enforced by Java compilers must also be enforced by Java virtual machine implementations.

Implementations must also observe rules when executing instructions that perform generic stack operations independent of type. As mentioned previously, the dup instruction pushes a copy of the top word of the stack, irrespective of type. This instruction can be used on any value that occupies one word: an int, float, reference, or returnAddress. It is illegal, however, to use dup when the top of the stack contains either a long or double, the data types that occupy two consecutive operand stack locations. A long or double sitting on the top of the operand stack can be duplicated in their entirety by the dup2 instruction, which pushes a copy of the top two words onto the operand stack. The generic instructions cannot be used to split up dual-word values.

To keep the instruction set small enough to enable each opcode to be represented by a single byte, not all operations are supported on all types. Most operations are not supported for types byte, short, and char. These types are converted to int when moved from the heap or method area to the stack frame. They are operated on as ints, then converted back to byte, short, or char before being stored back into the heap or method area.

Table 5-3 shows the computation types that correspond to each storage type in the Java virtual machine. As used here, a storage type is the manner in which values of the type are represented on the heap. The storage type corresponds to the type of the variable in Java source code. A computation type is the manner in which the type is represented on the Java stack frame.

Storage Type	Minimum Bits in Heap or Method Area	Computation Type	Words in the Java Stack Frame
`byte`	8	`int`	1
`short`	16	`int`	1
`int`	32	`int`	1
`long`	64	`long`	2
`char`	16	`int`	1
`float`	32	`float`	1
`double`	64	`double`	2
`reference`	32	`reference`	1

Table 5-3. Storage and computation types inside the Java virtual machine

Implementations of the Java virtual machine must in some way ensure that values are operated on by instructions appropriate to their type. They can verify bytecodes up front as part of the class verification process, on the fly as the program executes, or some combination of both. Bytecode verification is described in more detail in Chapter 7, "The Lifetime of a Type." The entire instruction set is covered in detail in Chapters 10 through 20

Execution Techniques

Various execution techniques that may be used by an implementation--interpreting, just-in-time compiling, adaptive optimization, native execution in silicon--were described in Chapter 1, "Introduction to Java's Architecture." The main point to remember about execution techniques is that an implementation can use any technique to execute bytecodes so long as it adheres to the semantics of the Java virtual machine instruction set.

One of the most interesting -- and speedy -- execution techniques is adaptive optimization. The adaptive optimization technique, which is used by several existing Java virtual machine implementations, including Sun's Hotspot virtual machine, borrows from techniques used by earlier virtual machine implementations. The original JVMs interpreted bytecodes one at a time. Second-generation JVMs added a JIT compiler, which compiles each method to native code upon first execution, then executes the native code. Thereafter, whenever the method is called, the native code is executed. Adaptive optimizers, taking advantage of information available only at run-time, attempt to combine bytecode interpretation and compilation to native in the way that will yield optimum performance.

An adaptive optimizing virtual machine begins by interpreting all code, but it monitors the execution of that code. Most programs spend 80 to 90 percent of their time executing 10 to 20 percent of the code. By monitoring the program execution, the virtual machine can figure out which methods represent the program's "hot spot" -- the 10 to 20 percent of the code that is executed 80 to 90 percent of the time.

When the adaptive optimizing virtual machine decides that a particular method is in the hot spot, it fires off a background thread that compiles those bytecodes to native and heavily optimizes the native code. Meanwhile, the program can still execute that method by interpreting its bytecodes. Because the program isn't held up and because the virtual machine is only compiling and optimizing the "hot spot" (perhaps 10 to 20 percent of the code), the virtual machine has more time than a traditional JIT to perform optimizations.

The adaptive optimization approach yields a program in which the code that is executed 80 to 90 percent of the time is native code as heavily optimized as statically compiled C++, with a memory footprint not much bigger than a fully interpreted Java program. In other words, fast. An adaptive optimizing virtual machine can keep the old bytecodes around in case a method moves out of the hot spot. (The hot spot may move somewhat as the program executes.) If a method moves out of the hot spot, the virtual machine can discard the compiled code and revert back to interpreting that method's bytecodes.

As you may have noticed, an adaptive optimizer's approach to making Java programs run fast is similar to the approach programmers should take to improve a program's performance. An adaptive optimizing virtual machine, unlike a regular JIT compiling virtual machine, doesn't do "premature optimization." The adaptive optimizing virtual machine begins by interpreting bytecodes. As the program runs, the virtual machine "profiles" the program to find the program's "hot spot," that 10 to 20 percent of the code that gets executed 80 to 90 percent of the time. And like a good programmer, the adaptive optimizing virtual machine just focuses its optimization efforts on that time-critical code.

But there is a bit more to the adaptive optimization story. Adaptive optimizers can be tuned for the run- time characteristics of Java programs -- in particular, of "well- designed" Java programs. According to David Griswold, Hotspot manager at JavaSoft, "Java is a lot more object-oriented than C++. You can measure that; you can look at the rates of method invocations, dynamic dispatches, and such things. And the rates [for Java] are much higher than they are in C++." Now this high rate of method invocations and dynamic dispatches is especially true in a well-designed Java program, because one aspect of a well-designed Java program is highly factored, fine-grained design -- in other words, lots of compact, cohesive methods and compact, cohesive objects.

This run-time characteristic of Java programs, the high frequency of method invocations and dynamic dispatches, affects performance in two ways. First, there is an overhead associated with each dynamic dispatch. Second, and more significantly, method invocations reduce the effectiveness of compiler optimization.

Method invocations reduce the effectiveness of optimizers because optimizers don't perform well across method invocation boundaries. As a result, optimizers end up focusing on the code between method invocations. And the greater the method invocation frequency, the less code the optimizer has to work with between method invocations, and the less effective the optimization becomes.

The standard solution to this problem is inlining -- the copying of an invoked method's body directly into the body of the invoking method. Inlining eliminates method calls and gives the optimizer more code to work with. It makes possible more effective optimization at the cost of increasing the run- time memory footprint of the program.

The trouble is that inlining is harder with object-oriented languages, such as Java and C++, than with non-object-oriented languages, such as C, because object-oriented languages use dynamic dispatching. And the problem is worse in Java than in C++, because Java has a greater call frequency and a greater percentage of dynamic dispatches than C++.

A regular optimizing static compiler for a C program can inline straightforwardly because there is one function implementation for each function call. The trouble with doing inlining with object- oriented languages is that dynamic method dispatch means there may be multiple function (or method) implementation for any given function call. In other words, the JVM may have many different implementations of a method to choose from at run time, based on the class of the object on which the method is being invoked.

One solution to the problem of inlining a dynamically dispatched method call is to just inline all of the method implementations that may get selected at run-time. The trouble with this solution is that in cases where there are a lot of method implementations, the size of the optimized code can grow very large.

One advantage adaptive optimization has over static compilation is that, because it is happening at runtime, it can use information not available to a static compiler. For example, even though there may be 30 possible implementations that may get called for a particular method invocation, at run-time perhaps only two of them are ever called. The adaptive optimization approach enables only those two to be inlined, thereby minimizing the size of the optimized code.

Threads

The Java virtual machine specification defines a threading model that aims to facilitate implementation on a wide variety of architectures. One goal of the Java threading model is to enable implementation designers, where possible and appropriate, to use native threads. Alternatively, designers can implement a thread mechanism as part of their virtual machine implementation. One advantage to using native threads on a multi-processor host is that different threads of a Java application could run simultaneously on different processors.

One tradeoff of Java's threading model is that the specification of priorities is lowest-common- denominator. A Java thread can run at any one of ten priorities. Priority one is the lowest, and priority ten is the highest. If designers use native threads, they can map the ten Java priorities onto the native priorities however seems most appropriate. The Java virtual machine specification defines the behavior of threads at different priorities only by saying that all threads at the highest priority will get some CPU time. Threads at lower priorities are guaranteed to get CPU time only when all higher priority threads are blocked. Lower priority threads may get some CPU time when higher priority threads aren't blocked, but there are no guarantees.

The specification doesn't assume time-slicing between threads of different priorities, because not all architectures time-slice. (As used here, time-slicing means that all threads at all priorities will be guaranteed some CPU time, even when no threads are blocked.) Even among those architectures that do time-slice, the algorithms used to allot time slots to threads at various priorities can differ greatly.

As mentioned in Chapter 2, "Platform Independence," you must not rely on time-slicing for program correctness. You should use thread priorities only to give the Java virtual machine hints at what it should spend more time on. To coordinate the activities of multiple threads, you should use synchronization.

The thread implementation of any Java virtual machine must support two aspects of synchronization: object locking and thread wait and notify. Object locking helps keep threads from interfering with one another while working independently on shared data. Thread wait and notify helps threads to cooperate with one another while working together toward some common goal. Running applications access the Java virtual machine's locking capabilities via the instruction set, and its wait and notify capabilities via the wait(), notify(), and notifyAll() methods of class Object. For more details, see Chapter 20, "Thread Synchronization."

In the Java virtual machine Specification, the behavior of Java threads is defined in terms of variables, a main memory, and working memories. Each Java virtual machine instance has a main memory, which contains all the program's variables: instance variables of objects, components of arrays, and class variables. Each thread has a working memory, in which the thread stores "working copies" of variables it uses or assigns. Local variables and parameters, because they are private to individual threads, can be logically seen as part of either the working memory or main memory.

The Java virtual machine specification defines many rules that govern the low-level interactions of threads with main memory. For example, one rule states that all operations on primitive types, except in some cases longs and doubles, are atomic. For example, if two threads compete to write two different values to an int variable, even in the absence of synchronization, the variable will end up with one value or the other. The variable will not contain a corrupted value. In other words, one thread will win the competition and write its value to the variable first. The losing thread need not sulk, however, because it will write its value the variable second, overwriting the "winning" thread's value.

The exception to this rule is any long or double variable that is not declared volatile. Rather than being treated as a single atomic 64-bit value, such variables may be treated by some implementations as two atomic 32-bit values. Storing a non-volatile long to memory, for example, could involve two 32-bit write operations. This non- atomic treatment of longs and doubles means that two threads competing to write two different values to a long or double variable can legally yield a corrupted result.

Although implementation designers are not required to treat operations involving non-volatile longs and doubles atomically, the Java virtual machine specification encourages them to do so anyway. This non-atomic treatment of longs and doubles is an exception to the general rule that operations on primitive types are atomic. This exception is intended to facilitate efficient implementation of the threading model on processors that don't provide efficient ways to transfer 64-bit values to and from memory. In the future, this exception may be eliminated. For the time being, however, Java programmers must be sure to synchronize access to shared longs and doubles.

Fundamentally, the rules governing low-level thread behavior specify when a thread may and when it must:

copy values of variables from the main memory to its working memory, and
write values from its working memory back into the main memory.

For certain conditions, the rules specify a precise and predictable order of memory reads and writes. For other conditions, however, the rules do not specify any order. The rules are designed to enable Java programmers to build multi-threaded programs that exhibit predictable behavior, while giving implementation designers some flexibility. This flexibility enables designers of Java virtual machine implementations to take advantage of standard hardware and software techniques that can improve the performance of multi-threaded applications.

The fundamental high-level implication of all the low-level rules that govern the behavior of threads is this: If access to certain variables isn't synchronized, threads are allowed update those variables in main memory in any order. Without synchronization, your multi-threaded applications may exhibit surprising behavior on some Java virtual machine implementations. With proper use of synchronization, however, you can create multi-threaded Java applications that behave in a predictable way on any implementation of the Java virtual machine.

Native Method Interface

Java virtual machine implementations aren't required to support any particular native method interface. Some implementations may support no native method interfaces at all. Others may support several, each geared towards a different purpose.

Sun's Java Native Interface, or JNI, is geared towards portability. JNI is designed so it can be supported by any implementation of the Java virtual machine, no matter what garbage collection technique or object representation the implementation uses. This in turn enables developers to link the same (JNI compatible) native method binaries to any JNI-supporting virtual machine implementation on a particular host platform.

Implementation designers can choose to create proprietary native method interfaces in addition to, or instead of, JNI. To achieve its portability, the JNI uses a lot of indirection through pointers to pointers and pointers to functions. To obtain the ultimate in performance, designers of an implementation may decide to offer their own low-level native method interface that is tied closely to the structure of their particular implementation. Designers could also decide to offer a higher-level native method interface than JNI, such as one that brings Java objects into a component software model.

To do useful work, a native method must be able to interact to some degree with the internal state of the Java virtual machine instance. For example, a native method interface may allow native methods to do some or all of the following:

Pass and return data
Access instance variables or invoke methods in objects on the garbage-collected heap
Access class variables or invoke class methods
Accessing arrays
Lock an object on the heap for exclusive use by the current thread
Create new objects on the garbage-collected heap
Load new classes
Throw new exceptions
Catch exceptions thrown by Java methods that the native method invoked
Catch asynchronous exceptions thrown by the virtual machine
Indicate to the garbage collector that it no longer needs to use a particular object

Designing a native method interface that offers these services can be complicated. The design needs to ensure that the garbage collector doesn't free any objects that are being used by native methods. If an implementation's garbage collector moves objects to keep heap fragmentation at a minimum, the native method interface design must make sure that either:

an object can be moved after its reference has been passed to a native method, or
any objects whose references have been passed to a native method are pinned until the native method returns or otherwise indicates it is done with the objects As you can see, native method interfaces are very intertwined with the inner workings of a Java virtual machine.

The Real Machine

As mentioned at the beginning of this chapter, all the subsystems, runtime data areas, and internal behaviors defined by the Java virtual machine specification are abstract. Designers aren't required to organize their implementations around "real" components that map closely to the abstract components of the specification. The abstract internal components and behaviors are merely a vocabulary with which the specification defines the required external behavior of any Java virtual machine implementation.

In other words, an implementation can be anything on the inside, so long as it behaves like a Java virtual machine on the outside. Implementations must be able to recognize Java class files and must adhere to the semantics of the Java code the class files contain. But otherwise, anything goes. How bytecodes are executed, how the runtime data areas are organized, how garbage collection is accomplished, how threads are implemented, how the bootstrap class loader finds classes, what native method interfaces are supported-- these are some of the many decisions left to implementation designers.

The flexibility of the specification gives designers the freedom to tailor their implementations to fit their circumstances. In some implementations, minimizing usage of resources may be critical. In other implementations, where resources are plentiful, maximizing performance may be the one and only goal.

By clearly marking the line between the external behavior and the internal implementation of a Java virtual machine, the specification preserves compatibility among all implementations while promoting innovation. Designers are encouraged to apply their talents and creativity towards building ever-better Java virtual machines.

Eternal Math: A Simulation

The CD-ROM contains several simulation applets that serve as interactive illustrations for the material presented in this book. The applet shown in Figure 5-14 simulates a Java virtual machine executing a few bytecodes. You can run this applet by loading applets/EternalMath.html from the CD-ROM into any Java enabled web browser or applet viewer that supports JDK 1.0.

The instructions in the simulation represent the body of the doMathForever() method of class Act, shown previously in the "Instruction Set" section of this chapter. This simulation shows the local variables and operand stack of the current frame, the pc register, and the bytecodes in the method area. It also shows an optop register, which you can think of as part of the frame data of this particular implementation of the Java virtual machine. The optop register always points to one word beyond the top of the operand stack.

The applet has four buttons: Step, Reset, Run, and Stop. Each time you press the Step button, the Java virtual machine simulator will execute the instruction pointed to by the pc register. Initially, the pc register points to an iconst_0 instruction. The first time you press the Step button, therefore, the virtual machine will execute iconst_0. It will push a zero onto the stack and set the pc register to point to the next instruction to execute. Subsequent presses of the Step button will execute subsequent instructions and the pc register will lead the way. If you press the Run button, the simulation will continue with no further coaxing on your part until you press the Stop button. To start the simulation over, press the Reset button.

The value of each register (pc and optop) is shown two ways. The contents of each register, an integer offset from the beginning of either the method's bytecodes or the operand stack, is shown in an edit box. Also, a small arrow (either "pc>" or "optop>") indicates the location contained in the register.

In the simulation the operand stack is shown growing down the panel (up in memory offsets) as words are pushed onto it. The top of the stack recedes back up the panel as words are popped from it.

The doMathForever() method has only one local variable, i, which sits at array position zero. The first two instructions, iconst_0 and istore_0 initialize the local variable to zero. The next instruction, iinc, increments i by one. This instruction implements the i += 1 statement from doMathForever(). The next instruction, iload_0, pushes the value of the local variable onto the operand stack. iconst_2 pushes an int 2 onto the operand stack. imul pops the top two ints from the operand stack, multiplies them, and pushes the result. The istore_0 instruction pops the result of the multiply and puts it into the local variable. The previous four instructions implement the i *= 2 statement from doMathForever(). The last instruction, goto, sends the program counter back to the iinc instruction. The goto implements the for (;;) loop of doMathForever().

With enough patience and clicks of the Step button (or a long enough run of the Run button), you can get an arithmetic overflow. When the Java virtual machine encounters such a condition, it just truncates, as is shown by this simulation. It does not throw any exceptions.

For each step of the simulation, a panel at the bottom of the applet contains an explanation of what the next instruction will do. Happy clicking.

Figure 5-14. The Eternal Math applet.

On the CD-ROM

The CD-ROM contains the source code examples from this chapter in the jvm directory. The Eternal Math applet is contained in a web page on the CD-ROM in file applets/EternalMath.html. The source code for this applet is found alongside its class files, in the applets/JVMSimulators and applets/JVMSimulators/COM/artima/jvmsim directories.

The Resources Page

For links to more information about the Java virtual machine, visit the resources page: http://www.artima.com/insidejvm/resources/

이 글에는 트랙백을 보낼 수 없습니다

Linux 2006/09/08 23:28

센드메일 장애 처리하기

블로그 > 미소

http://blog.naver.com/osang1997/40007954085

보내고 받는 메일의 양 제한하기

시스템의 제한 설정과 서비스의 안정성은 매우 깊은 연관성을 가지고 있다. 기본적으로 대부분의 서비스는 유저가 사용 가능한 시스템의 자원 제한이 거의 설정되어 있지 않은데, 메일 서비스도 마찬가지이다.
최근에는 메일의 이용율이 높아지고, 메일의 컨텐츠도 전통적인 텍스트 방식에서 음성,이미지등 각종 동영상이 주종을 이루면서 용량도 점점 커지고 있다. 물론 그만큼 하드웨어나 메일 서버의 소프트웨어적인 성능도 향상되고 있지만 용량이 큰 메일을 주고 받는다면 당연히 시스템의 부하가 올라가기 마련이고 이로 인하여 같은 서버내 다른 서비스에까지 영향을 미치게 된다. 따라서 시스탬에서 보내는 메일 서비스(SMTP)나 받는 메일 서비스(POP3)를 제공하고 있다면 용량이 큰 파일을 주고 받는 것을 적절히 제한할 필요가 있다.

sendmail 은 로컬의 메일을 외부로 발송하는 SMTP(보내는 메일서버) 기능도 있
지만 외부에서 서버내 계정으로 전송되는 메일을 받아서 서버에 저장하는 기능
도 있다. 이때 기본적으로는 보내거나 받는 메일의 양에 대한 제한이 전혀 없
어 10메가 이상이 넘는 큰 사이즈의 메일이 송 수신 될 경우 서버에 과부하가
걸릴 수 있으므로 아래와 같이 각각의 설정(보내는 메일과 받는 메일의 양)
을 적절히 제한하여 설정하는 것이 좋다.

>> SMTP 서버에서 보내는 양 제한하는 법.

/etc/mail/sendmail.cf (또는 /etc/sendmail.cf. 이는 sendmail 의 패키징 방
법에 따라 다르다.) 파일에서 다음과 같이 MaxMessageSize 부분의 주석을 제거
하고 제한하고자 하는 적절한 값을 입력한다.

# maximum message size
O MaxMessageSize=5024000

위와 같이 설정하였을 경우 현재의 서버를 보내는 메일 서버로 이용시 첨부 파
일이 5M 이상 초과하거나 웹에서 /usr/sbin/sendmail 을 실행하여 외부로 메일
을 발송하는 메일링 리스트등의 프로그램에서도 메일 발송시 5 메가 이상의 메
일은 보낼 수 없게 된다.
5024000 은 byte 단위이며 설정 변경 후 변경된 내용을 적용하려면 killall –
HUP sendmail 로 sendmail 데몬을 Refresh 하면 된다.

>> 받는 메일 서버에서 받는 양 제한하는 법.

외부에서 서버로 들어오는 메일에 대해서 용량을 제한하고 싶다면 같은 파일
(sendmail.cf) 에서 "Local and Program Mailer specification" 부분을 설정
해 주면 된다.

Mlocal, P=/usr/bin/procmail, F=lsDFMAw5:/|@qSPfhn9, S=10/30,
R=20/40, M=5024000, T=DNS/RFC822/X-Unix, A=procmail -Y -a $h -d $u

위와 같이 T=DNS/RFC822/X-Unix 앞부분에 M=5024000 부분을 추가해 주면 된
다.
마찬가지로 5024000는 byte 단위이며 각자의 시스템 환경에 따라 원하는 용량
만큼 적절히 설정해 주면 된다 역시 설정 변경 후 sendmail 을 refresh 하
면 적용이 된다.
위의 경우 서버에서는 5메가 이상의 메일은 수신하지 않으며 5메가 이상의 메
일을
보낸 이는

552 5.2.3 <antihong at tt.co.kr>... Message is too
large; 5024000 bytes max
554 5.0.0 <antihong at tt.co.kr>... Service
unavailable
와 같은 에러 메시지를 회신받게 된다.

아울러
# maximum number of recipients per SMTP envelope
O MaxRecipientsPerMessage=20

와 같은 부분이 있는데, 이 부분은 한번에 메일 발송 시 동시 발송(참조 발송)
이 가능한 메일 계정의 수를 뜻하는 것으로 SMTP 서비스를 제공한다면 이 설정
을 적용하는 것이 좋다. 기본적으로 이 값에도 제한이 없으므로 먼저 주석을
삭제한 후 적절한 값을 설정해 주면 한번에 동시 발송 가능한 메일의 수도 제
한할 수 있다.
(위의 경우에는 한번에 참조 발송이 가능한 메일 유저를 20명으로 제한)
설정이 끝난 후에는 killall –HUP sendmail 로 sendmail 을 재가동해주면 적
용된다.

메일 용량 쿼터 설정하기

각 유저의 홈페이지 공간에 대한 쿼터 설정방법은 잘 알고 있는데, Sendmail
을 제공시 메일 용량 쿼터에 대한 설정은 잘 모르는 경우가 많이 있다. 매일
쿼터에 대한 설정은 다소 복잡하기는 하지만 설정은 가능하다. 기본적으로
각 유저의 메일은 /var/spool/mail/ 디렉토리에 자신의 계정 소유로 저장이 되
게 되는데 바로 이 특성을 이용하여 쿼터 설정이을 하면 된다. 쿼터는 각 파
일 시스템별로 각각 설정이 가능하므로 각 유저의 홈디렉토리외에 /var 파티션
에도 추가적으로 쿼터를 설정하면 되는 것이다.
쿼터를 설정하는 방법은 일반적인 방법과 동일하다.
먼저 /etc/fstab 파일을 열어 /var 파티션이 별도로 설정되어 있다면 /var 파
티션에, 별도로 없으면 / 파티션에 유저쿼터나 또는 그룹쿼터 설정을 하면 된
다.

/dev/sda1 /home ext2    defaults,usrquota=/home/.quota
/dev/sda8 /var ext2
defaults,usrquota=/var/.mailquota

위에서는 /home 파티션에도 쿼터 설정을 하고 /var 파티션에도 쿼터 설정을
한 것을 볼 수 있다. 이후 touch /home/.quota 및 touch /var/.mailquota 로
사이즈가 0인 파일을 생성한 후 quotacheck –a 를 실행하면 파일 시스템을
스캔하여 디스크 사용량을 체크하여 해당 파일에 정보를 저장한다.

edquota user 를 실행하면

/dev/sda1: blocks in use: 0, limits (soft = 99980, hard = 99980)
   inodes in use: 0, limits (soft = 0, hard = 0)
/dev/sda8: blocks in use: 0, limits (soft = 29980, hard = 29980)
   inodes in use: 0, limits (soft = 0, hard = 0)

위와 같이 쿼터 설정이 나오는데, 여기에서 /dev/sda1 은 /home/ 디렉토리에
대한 쿼터 설정이고, /dev/sda8 은 /var/ 디렉토리에 대한 쿼터 설정이다. 위
설정으로 각각 /home 디렉토리에는 100메가로, 메일 용량은 30메가로 총 130메
가를 할당하여 쿼터를 설정한 것을 알 수 있다. 만약 별도의 /var 파티션이 없
이 / 파티션만 있는 상황에서 100 메가로 쿼터 설정을 했다면 이 용량은 홈페
이지의 용량과 메일 용량을 합쳐서 100메가로 적용이 되므로 주의하기 바란다.

##################### 참고. Quota 의 설정에 대해
위와 같이 edquota 사용시 관련된 라인이 아래와 같이 보이는 부분이 있다.
이 중
"blocks in use:" 는 유저가 현재 파티션에서 사용중인 총 블럭의 수를 킬로
바이트로,
"inodes in use:" 는 유저가 현재 파티션에서 사용중인 총 파일의 개수를 보
여준다.
이 두개의 "blocks in use:" 와 "inodes in use:" 는 시스템에 의해 자동으
로 설정되고
제어되므로 이 값을 임의로 변경할 필요는 없다.
그리고 quota 설정시 soft 제한(soft = 5000)은 유저가 사용할 수 있는 최대
용량을 뜻하며 (이 예제에서는 약 5M 이다.) hard 제한(hard = 6000)은 유저
가 초과할 수 없는 절대적인 디스크 사용량을 뜻한다. "hard limit"
는 "grace period" 옵션이 설정되었을 때에만 적용된다.
grace period 는 쿼터가 설정된 유저나 그룹이 soft limit 을 초과한 이후에
도 사용 가능한 시간의 한계이다. 예를 들어서 여러분이 관리하는 시스템
에 "해당 유저의 홈디렉토리를 50MB 로 쿼터 제한하고 초과시 7일간의 유예기
간을 준다"는 정책을 세울 수도 있다. 각자 유예 기간의 설정에 대해서는 나름
대로 적당하다고 생각하는 기간을 정의할 수 있다. grace period 는
edquota –t 로 확인 및 설정할 수 있으며 아래의 경우에는 grace period 가
7일로 설정되어 있는 것을 알 수 있다.

/dev/sd1: block grace period: 7 days, file grace period: 7 days
/dev/sda8: block grace period: 7 days, file grace period: 7 days

그리고 한 유저에게 적용된 쿼터 설정을 다른 유저에게도 그대로 적용하려
면 –p 옵션을 사용하면 되는데, 아래와 같이 실행하면 edquota 프로그램
은 /etc/passwd 에 정의된 유저중 UID 가 499 이후의 모든 유저에 대
해 "user" 의 쿼터 설정을 그대로 복사하게 된다.

edquota -p user `awk -F: '$3 > 499 {print $1}' /etc/passwd`

####################################################################

만약 쿼터가 초과된 계정에 메일을 발송하게 되면 아래와 같은 에러 메시지가
나며 더 이상 메일을 수신하지 못하게 된다.

sendmail 이 정상적으로 작동하는지 여부를 아는 방법

sendmail 이 현재 작동중인지 확인하는 방법은 아래 두 가지 방법으로 가능하
다.

(1) # ps auxw|grep sendmail 로 확인
위와 같이 확인시
root    0.0 0.0 2684 1460 S    Aug24 sendmail: accepting
connections on port 25

와 같이 sendmail: accepting connections on port 25 로 보이면 정상적으로
작동하는 것이다. 만약 sendmail 이 다운되어 작동하지 않을 때는 sendmail:
rejecting connections 라는 메시지가 보이게 된다.

(2) sendmail 이 반응하는 25번 포트로 접속.

# telnet tt.co.kr 25
Trying 211.47.66.50...
Connected to tt.co.kr.
Escape character is '^]'.
220 www10.tt.co.kr ESMTP Today and Tomorrow (http://tt.co.kr/)

와 같이 sendmail 이 바인딩하는 25번 포트로 telnet 을 접속하면 sendmail
이 반응을 하게 되는데, 위와 같이 접속을 하여 응답이 있을 경우에는 접속을
받아들일 준비가 되어 있는 상태이며 반응하지 않을 때는
Trying tt.co.kr...
telnet: Unable to connect to remote host: Connection refused
와 같이 접근이 거부되었다는 것을 알 수 있다.

갑자기 sendmail 이 작동하지 않을 때

sendmail 이 작동하지 않는 경우는 주로 2가지이다.

첫번째는, 시스템의 부하율인 Load Average 가 높아져 sendmail 이 작동하지
않는 경우이고 두번째는 sendmail 에서 받는 메일이 저장되는 /var 파티션이
100%가 되었을 경우이다.
Sendmail 은 기본적으로 시스템의 Load Average 수치가 12를 초과하였을 경우
에는 자동으로 작동하지 않게 되는데 이는 sendmail이 서비스 거부 공격등으
로 시스템의 부하가 높아졌을 때 sendmail 로 인하여 시스템이 다운되는 것을
막기 위한 조처이다.
이 값을 수정하려면 sendmail.cf의
# load average at which we refuse connections
O RefuseLA=12
에서 수정한 후 killall –HUP sendmail 로 재실행해 주면 되고, 이 값은 각각
의 시스템에 따라 적절히 조정하면 된다. 만약 현 시스템의 특성상 늘 부하
가 높아 로드가 자주 12를 초과한다면 이 값을 각자의 시스템 환경에 맞게 적
절히 조절하여야 외부에서 오는 메일을 받을 수 있게 된다.(서버에서 25번 포
트로 바인딩하고 있어야 외부에서 오는 메일을 수신할 수 있다.) 그리고 메일
이 저장되는 /var/spool/mail 파티션이 가득 찼을 경우(파티션 100%) 에도
sendmail 이 작동하지 않으므로 파티션이 가득 찼을 경우에는 /var/log/ 등에
서 불필요한 데이터를 삭제하여 /var/spool/mail 이 포함된 파티션이 100% 를
넘지 않도록 하여야 한다. 용량 정리를 하여 파티션이 100%가 넘지 않으면
sendmail 이 자동으로 살아나는 것을 알 수 있다.
또한 시스템의 Load Average 가 8을 넘으면 서버를 통해 메일을 발송해도 메일
을 통해 바로 전송되지 않고 일단 서버의 메일 큐에 저장이 된 후에 발송이 되
게 된다. 이 역시 같은 이유 때문인데 이 수치는 sendmail.cf 의

# load average at which we just queue messages
O QueueLA=8
에서 적절히 설정하면 된다.

참고로 현재 시스템의 Load Average는 w 명령어를 이용하여 확인 가능하다.
w 를 이용시 시스템의 Load Average 는 0.25, 0.40, 0.43 와 같이 보이는데
이는 각각 현재를 기준으로 지난 1분, 5분, 15분간의 평균 시스템 부하율을 나
타낸다.

sendmail 애서 보내는 메일(SMTP) 기능을 차단하고자 할 때

sendmail 에서 Relay 기능을 막아 두었다 하더라도 최근 버전에는 사용자 인증
(SMTP AUTH) 기능이 있어 서버에 계정이 있으면 모든 유저가 메일 서버를 이용
해 SMTP 기능을 이용하여 메일을 발송할 수 있다. 이를 막으려면 최신의
8.11.4 나 8.11.5 와 같이 최신 버전으로 업그레이드 후 /etc/mail/smtpauth
파일에 보내는 메일 기능을 허용할 유저를 입력해 주면 된다. (최근에
8.11.6 이전 버전에 심각한 보안 문제가 확인되었으므로 반드시 8.11.6 버전이
나 8.12 버전으로 업그레이드하여야 한다.) 파일을 생성 후 아무런 유저도 입
력하지 않으면 서버에 계정이 있다 하더라도 어느 누구도 메일을 발송할 수 없
게 된다. 따라서 최신의 8.11.6 버전으로 업그레이드 할 것을 권장한다. 이외
여러 변형된 방법이 존재하는데, ipchains 나 iptables 를 이용해 패킷 필터링
을 하는 방법도 있다.

커널 2.2.X 일 경우
ipchains -A output -p tcp -y -d 0/0 25 -j DENY
커널 2.4.X 일 경우
iptables -A OUTPUT -p tcp --syn --dport 25 -j DROP

위와 같이 설정시 목적지(Target) 포트가 25번 포트로 향하는 초기화(SYN) 패
킷만을 차단하여 메일을 발송할 수 없도록 한다. 물론 초기화(SYN) 패킷에 대
해서만 필터링을 하였으므로 외부에서 오는 메일을 받는 것은 관계 없다.

바이러스 메일 필터링 방법

최근에 Sircam 이나 Nimda 등 일정 주기마다 발생하는 바이러스 메일 때문에
서버 관리자들은 마음 고생이 이만저만이 아니다. Sendmail 에서는 이러한 바
이러스 메일이나 스팸메일에 대해 룰셋(ruleset)을 이용하여 차단하는 기능이
있는데, 이를 사용하는 방법에 대해 알아보도록 하자.
Sendmail 에서는 제목이나 메일러 또는 첨부파일의 화일명등 각종 메일헤더 정
보를 이용하여 필터링을 할 수 있는데, 먼저 발송되는 메일 제목(subject)으
로 필터링을 해 보도록 하자. 아래는 메일 제목에 ILOVEYOU 로 발송하는 멜리
사 바이러스를 차단하는 룰셋을 적용해 본 예이다.

먼저 sendmail.cf 파일을 열어 제일 하단에 아래의 내용을 추가한다.

HSubject: $>Check_Subject
D{WORMmsg}Access Denied - This message may contain a virus.

SCheck_Subject
RILOVEYOU $#error $: 501 ${WORMmsg}
RRe: ILOVEYOU $#error $: 501 ${WORMmsg}
RFW: ILOVEYOU $#error $: 501 ${WORMmsg}

# 주의 : $#error 앞의 blank는 스페이스가 아니라 반드시 탭으로 띄워주어
야 한다.
Sendmail.cf 의 설정 내용이 다소 어렵고 복잡하기는 한데, 위 설정의 의미를
간단히
살펴보도록 하자.

H -- 위 경우에는 헤더에서 Subject:라는 문자열을 찾아 이 헤더를
Check_Subject로 정의한다.
D -- WORMmsg 라는 매크로를 정의하여 해당 룰셋에 적용되는 제목을 확인시 발
송한
   유저에게 보낼 메시지를 정의한다.
S -- 헤더에서 check_subject로 정의한 부분을 룰셋으로 지정하는 부분이다.
R -- 해당 문자열이 포함된 메일을 발견시 앞에서 정의한 에러 메세지를 첨부
하여
반송을 시킨다.

위와 같이 룰셋을 적용하였을 경우 "I LOVE YOU" 와 같이 공란이 있을 경우 적
용되지 않으며 "ILOVEYOU from me" 와 같이 특정 단어가 추가시에도 적용되지
않으며 반드시
정확히 일치하여야 한다. 추가적으로 회신시 추가되는 Re: 와 전달(포워딩)시
추가되는 FW: 가 추가된 메일도 거부한다.

다음으로 얼마전 유행했던 Sircam 바이러스 메일을 필터링해 보도록 하자.
Sircam 바이러스의 헤더를 보면 정상적인 메일과는 달리 메일 헤더에
Content-Disposition: Multipart message 와 같은 부분이 추가되어 있으며 이
특징을 이용하여 필터링을 하면 된다.

Sendmail.cf 파일에 아래의 룰셋을 추가하면 된다.

HContent-Disposition: $>check_sircam
D{SIRCAM}"Warning: I Guess Sircam.worm Virus"
Scheck_sircam
RMultipart message    $#error $: 550 ${SIRCAM}

# 주의 : $#error 앞의 blank는 스페이스가 아니라 탭으로 띄워주어야 한다.

sendmail.cf의 수정을 끝낸 후 바로 sendmail을 재 시작하지 말고
룰셋이 정상적으로 작동하고 있는지 아래와 같이 테스트를 하는 것이 좋다.

# /usr/lib/sendmail –bt # 테스트 모드로 접속
ADDRESS TEST MODE (ruleset 3 NOT automatically invoked)
Enter
> check_sircam Multipart message    # Sircam 룰셋 테스트
check_sircam input: Multipart message
check_sircam returns: $# error $: 550 553 Warning: I Guess
Sircam.worm Virus
> ctrl-D    # 테스트 종료

위와 같이 확인된 후 sendmail을 재시작(killall –HUP sendmail) 하면 바로
적용된다.
아래와 같이 tail –f /var/log/maillog 로 로그 파일을 지켜보면 아래와 같
이 실제로 Sircam 바이러스가 필터링되고 있음을 확인할 수 있다.

Sep 27 15:09:51 www sendmail[21386]: f8369of21386:
to=<antihong at tt.co.kr>, delay=00:00:01,
pri=241584 Warning: I Guess
Sircam.worm Virus.

마지막으로 최근에 가장 영향을 많이 주었던 변형된 Nimda Worm 을 필터링하
는 방법에 대해 알아보자. Nimda Worm 은 정상적인 메일 메시지와 달리 헤더에
boundary="====_ABC1234567890DEF_====" 나
boundary="====_ABC123456j7890DEF_===="    라는 부분이 있는데, 이 부분으
로 필터링을 할 수 있다. 즉 메일 헤더에 위와 같은 설정이 되어 있으면
Nimda Worm 으로 간주하고 필터링 하면 되는 것이다. Sircam 에서와 같은 방법
으로 sendmail.cf 파일의 설정은 아래와 같다.

HContent-Type: $>check_ct

D{NIMDA}"I guess NIMDA.WORM!!!"

Scheck_ct
R$+boundary="====_ABC1234567890DEF_====" $#error $: 550 ${NIMDA}
R$+boundary="====_ABC123456j7890DEF_====" $#error $: 550 ${NIMDA}

이외 메일 필터링에 대한 더욱 구체적인 방법에 대해서는
http://certcc.or.kr/paper/tr2001/tr2001-03/email security by
procmail.html 나
http://quanta.khu.ac.kr/~dacapo/sendmail/rulesets/ 를 참고하기 바란다.
그리고 이외 관련하여 바이러스를 스캔하거나 필터링 할 수 있는 몇몇 프로그
램이 있는데 이에 대해서는 http://www.rav.ro/ , http://www.amavis.org/ ,
http://www.sophos.com/ 등을 참고하기 바란다.

메일이 받아지지 않는 경우

아웃룩 익스프레스에서 “배달” 을 눌러 메일을 수신하려고 할 때 메일이 받
아지지 않는 경우가 있다. 이러한 경우에는 아래와 같이 여러가지 이유가 있
을 수 있으니 아래의 사항에 대해 하나씩 원인을 찾아보기 바란다.
(1) IMAP 패키지가 설치되지 않았을 경우
서버에 배달되어 있는 자신의 계정으로 온 메일을 클라이언트 PC에서 받으려
면 pop3 데몬이 반응하게 된다. pop3d 는 IMAP 패키지안에 포함되어 있으므
로, IMAP 패키지를 설치하여야 pop3 를 사용할 수 있다. Rpm 으로 설치했다면
rpm –q imap 으로 현재 시스템에 imap 패키지가 설치되어 있는지 확인한다.
또는 /usr/sbin/ipop3d 파일이 있는지 확인해 본다.
(2) Inetd 에 설정되어 있지 않을 경우
pop3d 는 inetd 또는 Xinetd 에서 작동하게 된다.
/etc/inetd.conf 또는 /etc/xinetd.conf 파일을 살펴보아 ipop3 가 주석처리
되어 있거나 pop3 가 disable = yes 로 되어 있지는 않은지 확인한다.
(3) TCP Wrapper 에 설정되었는지 여부 확인
/etc./hosts.deny 에 pop3d 접근이 차단되지는 않았는지 확인한다.
(4) 계정에 Lock 이 걸리지 않았는지 확인
메일을 받는 과정에서 갑자기 회선이 끊기거나 PC가 다운되는 등 비정상적으
로 종료시 서버의 pop3d 프로세스가 죽지 않고 계속 남아 있는 경우가 있다.
이러한 경우 계정에 “Lock 이 걸렸다” 라고 하며 이러한 경우에는 해당 프로
세스를 찾아 kill 을 하면 된다. 만약 계정에 Lock 이 걸린 상태에서 아웃룩
익스프레스에서 메일을 수신하려고 하면 아래와 같은 에러가 나게 된다.
“메일 서버에 로그온하는 데 문제가 있습니다. 지정한 암호가 거부되었습니
다.
계정: 'temazone.com', 서버: 'tt.co.kr', 프로토콜: POP3, 서버 응답:
'-ERR Can't get lock. Mailbox in use', 포트: 110, 보안(SSL): 아니오,
서버 오류: 0x800CCC90, 오류 번호: 0x800CCC92”

(5) Pop3 접속이 많은 경우
Pop3d 가 서비스되는 inetd는 기본적으로 60초동안 40회의 접속을 받아들이
도록 (즉, maximum 40회 fork되도록) 설정되어 있다. 따라서 짧은 시간에
pop3d
요구가 많을 경우에는 메일로그에 pop3/tcp server failing (looping) 라는 메
시지가
나면서 pop3d 데몬 자체가 다운되어 버리므로 동시에 처리 가능한 프로세스의
한계
를 적절히 높여주어야 한다.
이를 위해서는 /etc/inetd.conf 를 열어 아래와 같이 수정하면 된다.

이전설정)
pop-3    stream    tcp    nowait    root /usr/sbin/tcpd    ipop3d

변경 설정)
pop-3    stream    tcp    nowait.200    root    /usr/sbin/tcpd
ipop3d
(위의 경우 처리 가능한 프로세스를 200회로 늘려주었다.)
이후 killall -HUP inetd 를 하면 된다.

(6) 110 번 포트로 확인
아래와 같이 pop3d 포트인 110번 포트로 직접 접속하여 수작업으로 확인 가능
하다.

# telnet pop3.tt.co.kr 110 # 110번으로 직접 확인
Trying 210.17.6.5...
Connected to pop3.tt.co.kr.
Escape character is '^]'.
+OK POP3 pop3.tt.co.kr v2001.76 server ready
user abc    # abc 라는 계정으로 접속
+OK User name accepted, password please
pass xyz    # abc 의 암호 xyz 입력
+OK Mailbox open, 10 messages
quit    # 접속을 끊음.
+OK Sayonara
Connection closed by foreign host.

위의 경우는 정상적인 경우이며 에러가 있을 경우(만약 암호가 다르게 설정되
었을 경우 -ERR Bad login 와 같은 메시지가 나게 된다.) 각각의 경우에 따라
에러 메시지를
각각 확인할 수 있다.

(7) mail –v 로 확인
타 서버에서 mail –v antihong at tt.co.kr 와 같이
메일을 발송하여 정상적으
로 메일이 도착하는지를 확인해 본다. –v 옵션을 이용하여 메일 발송시에는
메일 전송의 경로 및 메일 서버간에 주고받는 메시지를 확인할 수 있으므로 문
제의 원인을 찾는데 도움이 된다.

특정한 곳으로만 메일이 돌아올 때

다른 곳은 문제가 없는데, 해외등 특정한 곳으로만 메일이 전송되지 않고 리턴
되는 경우가 있다. 이러한 경우라면 자신의 메일서버가 mail-abuse.org 의 블
랙 리스트에 등록되어 있지는 않은지 확인해 볼 필요가 있다. 특히 회신된 메
일에 “...refused by blackhole site relays.mail-abuse.org” 와 같은 메시
지가 보인다면 반드시 여부를 확인해 보아야 한다. 적지 않은 메일 서버에서
는 메일 수신시 실시간으로 이 데이터를 참조하므로 mail-abuse.org 에서 스
팸 메일 서버로 등록되면 이 기관에 등록된 도메인으로 메일을 보낼 때 받는
쪽에서는 스팸 메일로 간주하고 수신을 거부하게 된다. 이를 확인하는 방법은
http://mail-abuse.org/cgi-bin/nph-rss 사이트에서
메일 서버의 IP 를 조회
해 보면 된다. 아래는 위 사이트에서 한 IP 에 대해 조회해 본 결과 블랙 리
스트에 등록되어 있는 것을 보여주고 있다. 이러한 경우라면 조회한 메일 서버
의 Relay 가 허용되어 스팸 메일 서버로 사용된 적이 있거나 현재 사용되고 있
다는 뜻이다. 만약 스팸메일 서버로 등록되어 있지 않다면 211.47.65.xxx is
NOT currently on the RSS list 와 같이 보이게 된다.

자신의 메일 서버를 이 블랙리스트에서 제외하려면 먼저 자신의 메일서버에
Relay 가 허용되어 있는지 확인 후 메일 서버에서 Relay 를 거부 설정한 후
If you'd like 211.47.65.135 to be removed from our list, please click
here. 를 따라 클릭하여 신청을 하면 된다. 이 링크를 클릭하면 신청폼이 나오
는데, 이 곳에 입력하여 신청을 하면 바로 처리가 된다. Relay 거부 설정을
한 후 신청을 해야 처리가 되므로 반드시 사전에 Relay 거부 설정을 확인하기
바란다. 메일 서버의 Relay 여부를 조회하는 방법에 대해서는 본지 10월호
“철벽 보안을 위한 모니터링 올가이드” 를 참고하기 바란다.

복수 MX 설정시 주의해야 할 점

DNS 서버에서 설정하는 MX 레코드는 해당 호스트로 수신되는 편지를 다른 호스
트로 라우팅 하도록 한다. 특히 웹서버와 메일 서버를 분리하고자 할 경우 사
용되는데, 원격 호스트에서 아래와 같이 설정된 도메인 tt.co.kr 로 편지를 송
신할 경우에 Sendmail이 어떻게 동작하는지 알아보자.

tt.co.kr. IN MX    10 mail1.tt.co.kr.
   IN MX 20    mail2.tt.co.kr.
   IN MX 20    mail3.tt.co.kr.

다음은 메일이 수신되는 차례를 보여준다.

(1) Preference 값이 10으로 가장 낮은 mail1 로 먼저 배달을 시도한다.
(2) 만약 mail1.tt.co.kr 이 접근이 불가능하면 mail2 혹은 mail3 으로 배달
을 시도한다.
(3) (2) 에서 시도한 메일서버로도 접근이 되지 않으면 (2)에서 접근 되지 않
은 호스트로
배달을 시도한다. 즉 mail2 로 전송을 시도했다면 mail3 으로 배달을 시도
한다.
(4) mail2 와 mail3 서버에 접근이 불가능하다면 자체 큐잉 후, 일정 기간동
안 주기적으로
   1-3의 과정을 반복한다.

흔히 MX 레코드에 대해 잘못 생각하는 것 중 하나는 만약 mail1 이 다운되어
mail2 로 편지가 배달되었을 때, 편지가 mail2 의 메일 박스에 저장 된다고 생
각하는 것이다. 만약 이렇게 된다면 유저 입장에서는 메일 수신시 pop3 서버
를 mail1.tt.co.kr 와 mail2.tt.co.kr 과 같이 여러 개 설정해야 하는 것처럼
보인다. 그러나 일반적으로 mail2.tt.co.kr 이나 mail3.tt.co.kr 처럼
Preference 가 높은(즉 우선도가 낮은) 값을 갖는 메일 서버는 큐잉 서버로 동
작하도록 설정하기 때문에, 결국 메일은 하나의 호스트(mail1)로 모이게 되는
것이다. 위와 같이 mail2와 mail3 서버가 큐잉 메일 서버로 작동하려면 mail1
와 mail2의 sendmail 이 아래와 같이 설정되어야 한다.

(1) 해당 도메인(tt.co.kr)에 대한 인증을 갖지 않아야 한다.
(즉, mail2 나 mail3 메일 서버의 sendmail.cw 또는 local-host-names 파일에
tt.co.kr 이 설정되어 있으면 안 된다.)

(2 )서버는 해당 호스트로의 메일 릴레이(Relay)를 허용하여야 한다.
(즉, /etc/mail/access 에서 아래와 같이 정의되어야 한다.)
mail1.tt.co.kr    relay

인증을 갖지 않아야 한다는 것은 Sendmail의 w 클래스(sendmail.cw(local-
host-names) 혹은 sendmail.cf의 Cw)에 tt.co.kr 도메인이 설정되지 않아야
하는 것을 의미하고, 메일 릴레이란 수신되는 편지의 최종 배달지가 자신이 아
닐 경우, 즉 인증을 갖지 않을 경우 편지를 해당 호스트로 포워딩하는 것을 의
미한다. 최근의 배포판에서는 기본적으로 sendmail이 릴레이를 거부하도록 설
정되어 있으므로 메일 큐잉 서버의 경우는 해당 호스트를 목적지로 하는 메일
에 대해서는 릴레이를 허용하도록 설정하여야 한다는 것을 주의하기 바란다.
mail1 의 다운으로 인해 mail2 로 전달되는 메일은 메일큐에 저장되어 있으면
서, 일정 기간(Sendmail.cf에서 지정된 Timeout.queuereturn=5d 만큼)동안 주
기적(Sendmail 구동시 지정된, 일반적으로 30분 -q30m)으로 mail1 로 배달이
재시도된다.

메일 서버의 버전을 숨기는 법

다른 데몬도 마찬가지이지만 메일 서버 역시 해당 포트로 원격 접속을 해 보
면 메일 서버의 버전 정보등을 확인할 수 있다. 그러나 시스템 관리자 입장에
서 보안상의 문제로 현재 운영중인 메일 서버의 버전등을 숨기거나 속이고 싶
을 때가 있는데. 이러한 경우에는 아래의 방법을 이용하면 된다.

(1) sendmail 의 경우
sendmail.cf 파일을 보면 아래와 같은 설정이 있다.
# SMTP initial login message (old $e macro)
O SmtpGreetingMessage=$j Sendmail $v/$Z; $b
이 부분을 적절히 삭제하거나 다른 정보로 입력후 sendmail 을 재가동하면 된
다.
필자가 운영하는 메일서버의 경우
O SmtpGreetingMessage=$j Today and Tomorrow(http://tt.co.kr/) 와 같이
설
정하였고 이때 25번 포트로 접속시 보이는 정보는 아래와 같다.

# telnet tt.co.kr 25
Trying 211.47.66.50...
Connected to tt.co.kr.
Escape character is '^]'.
220 www10.tt.co.kr ESMTP Today and Tomorrow(http://tt.co.kr/)

(2) pop3d 의 경우
pop3d 의 경우 소스에서 직접 수정하여야 하는데, 압축 해제한 디렉토리
의 /src/ipopd 에 보면 ipop3d.c 파일이 있다. 이 파일을 살펴보면

char *version = "2001.75"; /* server version */
라는 부분이 있는데, 필자가 운영하는 pop3d 의 경우 소스에서
char *version = "xxxxxxxxxx"; /* server version */
와 같이 수정 후 컴파일 하였고 이때 110번 포트로 원격 접속시 보이는 정보
는 아래와 같다.

# telnet tt.co.kr 110
Trying 211.47.66.50...
Connected to tt.co.kr.
Escape character is '^]'.
+OK POP3 www10.tt.co.kr vxxxxxxxxxx server ready

버전외 다른 각종 정보도 수정할 수 있으니 각자 상황에 맞게 적절히 설정하
기 바란다.

sendmail 과 관련된 몇 가지 명령어

>> mail1q
mailq 프로그램의 목적은 큐잉된(/var/spool/mqueue 에 저장된) mail 메시지
의 요약된 정보를 보여준다. 네트워크 다운등 어떤 특정한 이유로 바로 발송
되지 못한 메일은 일차적으로 /var/spool/mqueue 에 큐잉된 상태로 저장된 후
일정 시간마다 발송을 위해 재시도가 되는데, 현재 큐잉된 메일 메시지의 요
약 정보를 보려면 아래와 같이 확인할 수 있다.

# mailq

/var/spool/mqueue/q1 (2 requests)
----Q-ID---- --Size-- -----Q-Time----- ------------Sender/Recipient------
------
f7A84oV15068 1446 Fri Aug 10 17:04 nobody
(Deferred: Connection timed out with kebi.net.)
darling at kebi.net
f775ieF24893 521898 Tue Aug 7 14:44 <shlee at
tt.co.kr>
(Deferred: Connection timed out with mail.unitel.net.)
<cf1318 at
unitel.net>
/var/spool/mqueue/q2 is empty
   /var/spool/mqueue/q3 (1 request)
----Q-ID---- --Size-- -----Q-Time----- ------------Sender/Recipient------
------
f775nJF25249 230815 Tue Aug 7 14:49 <shlee at
tt.co.kr>
(Deferred: Connection timed out with hanmail.com)
cuwww23 at hanmail.com

위 메시지를 보면 어떠한 이유로 메일이 발송되지 못하고 있는지를 추측할 수
있다.
3 메시지 모두 수신자의 e-mail 주소를 잘못 기입했기 때문인데, 각각
kebi.com 인데, kebi.net 으로 unitel.co.kr 인데, unitel.net 으로 ,
hanmail.net 인데, hanmail.com 으로 도메인 주소를 잘못 기입하여 메일을 발
송하여 서버에서 메일을 발송하지 못하고 큐에 저장되어 있는 것을 확인할 수
있다.
여기에서 주의할 점은 mailq 명령어는 일반 유저로 실행하여 확인이 가능하므
로 퍼미션을 700 등으로 조절하여 일반 유저들은 실행할 수 없도록 하는 것이
좋다.

>> mailstats
mailstats 프로그램은 현재의 메일 송수신과 관련하여 통계를 보여준다.

* 현재의 메일 통게를 보려면 아래와 같이 확인할 수 있다.

# mailstats
Statistics from Sat Aug 11 04:02:02 2001
M msgsfr bytes_from msgsto    bytes_to msgsrej msgsdis Mailer
1    0    0K    3    317K    0 0 *file*
4    690 596691K    824 137070K    68426 0 esmtp
9 63    12212K    0    0K 27 0 local
=============================================================
T    753 608903K    827 137387K    68453 0
C    753    827    68453

이를 적절히 이용하면 mrtg 를 이용해 일정 시간마다 발송되고 수신되는 메일
의 개수를 통계로 내어 그래프로 볼 수 있다.(본지 10월호, 철벽보안을 위한
모니터링 올가이드 참조)

최근 sendmail 관련 버그에 대해

한동안 문제가 없었던 sendmail 에 최근 들어 몇 가지 보안 문제가 발견되었
다.
이 버그는 매우 치명적인 문제인데, 아직 이를 모르고 그대로 사용중인 유저들
이 많은 것 같다. 각자의 메일 서버에는 해당사항이 없는지 꼭 확인해 보기 바
란다.

첫번째로, 8월말에 발표된 버그는 현재 대부분의 메일 서버 프로그램으로 사용
중인 sendmail 8.11.6 이전 버전에 해당하는 보안버그로서 일반유저가 Local
에서 root 권한을 얻을 수 있는 매우 치명적인 버그인데, 이미 공격 소스가 여
러 사이트에 공개되어 있다.
참고로 이 버그는 8.11.0부터 8.11.5 버전까지만 해당하므로 8.10.x 나 8.9.x
는 해당되지 않는다. 따라서 아래의 사이트를 참고로 sendmail 을 8.11.6 이
나 8.12등 최신버전으로 업그레이드하기 바란다.

8.11.0부터 8.11.5 의 경우 8.11.6 으로 업그레이드하면 되고 8.12.0.Beta 의
경우 8.12.0.Beta19 이상으로 업그레이드하면 된다. 이에 대해서는
http://www.securityfocus.com/bid/3163 나
http://www.sendmail.org/8.11.html
를 참고하기 바란다.

두번째는, 10월초에 발견된 버그로서 모든 버전에 해당하는 문제인데, 이전에
도 자주 나왔던 문제이다. 바로 shell 접근이 가능한 일반유저가 sendmail
에 -q 옵션을 사용하여 큐에 있는 메시지를 드롭할 수 있는 문제이다. 아래의
설명을 보기 바란다.

[user@net user]$ id
uid=778(user) gid=778(user)
[user@net user]$ mailq
   Mail Queue (1 request)
--Q-ID-- --Size-- -----Q-Time----- ------------Sender/Recipient----------
--
NAA05248 11 Tue Oct 2 13:03 user1
(Deferred: Connection refused by tt.co.kr.)
test at tt.co.kr

[system@net system]$ /usr/sbin/sendmail -q -h10000
Too many hops 10000 (25 max): from system via localhost, to test
at tt.co.kr
Too many hops 10000 (25 max): from MAILER-DAEMON via localhost, to
postmaster
Too many hops 10000 (25 max): from MAILER-DAEMON via localhost, to
postmaster
MAILER-DAEMON... Saved message in /usr/tmp/dead.letter
[user@net user]$ mailq
Mail queue is empty

위와 같이 hop count 를 크게 설정함으로써 일반 유저가 현재 큐의 내용을 강
제적으로 drop 시킬 수 있다.

세번째는 역시 모든 버전에 해당하는 문제로 일반 유저가 sendmail -q -d0-
xxxx.xxx 와 같이 사용시 (xxx는 디버깅 레벨이다.) 일반 유저가 메일서버의
각종 설정 뿐만 아니라 큐에 저장되어 있는 내용, 메시지 경로나 제목, 메일
소프트웨어등의 정보를 볼 수 있는 문제이다.
두번째,세번째 문제는 sendmail.cf 에서

O PrivacyOptions=authwarnings,novrfy,noexpn,restrictqrun
와 같이 restrictqrun 를 추가함으로써 해결 가능하다.

기타 메일과 관련된 장애가 확인 시

지난달 아파치 웹서버의 장애에 대해 이야기하면서 문제나 장애가 발생시에는
웹서버의 error_log 메시지를 살펴보도록 이야기 했었다. 메일서버도 마찬가
지이다. 메일서버 장애시는 문제의 원인을 찾기 위해 로그 파일을 살펴보는 습
관을 들이는 것이 좋다.
메일 관련 로그는 /var/log/messages 나 /var/log/maillog 파일을 살펴보면 되
며 로그파일을 보면 여기에서 언급하지 않은 문제가 발생했다 하더리도 어렵
지 않게 원인을 찾을 수 있을 것이다. 다시 한번 강조하지만 모든 문제의 원
인과 해결책은 로그에 있다는 것을 명심하기 바란다.

이 글에는 트랙백을 보낼 수 없습니다

Linux 2006/09/08 14:13

How to 'chroot' an Apache tree with Linux and Solaris

블로그 > wooya510님의 블로그

http://blog.naver.com/wooya510/60005129890

Detailed Steps

Prepare a Chroot-ed File System

Set up the tree anywhere (preferably another disk or a non-system partition to discourage others from making hard links to files outside your web tree), but use a symlink (eg /www) to reference it.
ROOT# mkdir /export/misc/www
ROOT# ln -s /export/misc/www /www
Create the basic directories; bin will be a link to usr/bin
!! NOTE the lack of leading slashes in this example, except where I copy files from the regular file system. Do NOT mix up your chroot-ed tree with your real '/'!
You have been warned!
I have indicated the chroot-ed files in magenta
ROOT# cd /www
ROOT# mkdir -p usr/bin usr/lib lib etc tmp dev webhome
ROOT# ln -s usr/bin bin
/tmp is given special perms.
ROOT# chmod 777 tmp
ROOT# chmod +t tmp
Make the special device /dev/null
ROOT# mknod -m 666 dev/null c 1 3
Set up the timezone info for YOUR timezone (this example uses MET):
ROOT# mkdir -p usr/share/zoneinfo
ROOT# cp -pi /usr/share/zoneinfo/MET usr/share/zoneinfo/
ROOT# cd etc
ROOT# ln -s ../usr/share/zoneinfo/MET localtime
ROOT# cd ..
You will find that perl and/or mod_perl will complain about the lack of locale settings. To fix this install your locale files in the chroot-ed tree:
ROOT# set |grep LANG LANG=en_US
ROOT# mkdir /www/usr/share/locale
ROOT# cp -a /usr/share/locale/en_US /www/usr/share/locale/
Now copy in the shared libraries that provide a very basic chroot-ed file system
ROOT# cp -pi /lib/libtermcap.so.2 /lib/ld-linux.so.2 /lib/libc.so.6 lib/

Test your tree ('cat' will be needed by 'apachectl' later, but is not strictly necessary):
ROOT# cp -pi /bin/ls /bin/sh /bin/cat bin/
ROOT# chroot /www /bin/ls -l /

lrwxrwxrwx   1 0        0               7 Jan 29 09:24 bin -> usr/bindrwxr-xr-x   2 0        0            1024 Jan 29 09:28 devdrwxr-xr-x   2 0        0            3072 Jan 29 13:17 etcdrwxr-xr-x   2 0        0            1024 Jan 29 13:12 libdrwxrwxrwt   2 0        0            1024 Jan 29 09:23 tmpdrwxr-xr-x   5 0        0            1024 Jan 29 09:23 usrdrwxr-xr-x   2 0        0            1024 Jan 29 10:41 webhome

You can remove 'ls'; it was only used for testing:
ROOT# rm bin/ls

Prepare a User and the Naming Service

Here we create the user whom apache will run as, and the necessary naming services for this configuration.

Create a new user that doesn't exist on the system, and give the user a unique name (eg: www) and user id (eg:888). Note that it isn't actually necessary for the user:group to exist in the real authentication (/etc/passwd /etc/group) files. It's up to you..
ROOT# cd /www
ROOT# touch etc/passwd etc/group etc/shadow
ROOT# chmod 400 etc/shadow
Edit these three files. For the sake of this example I am just echoing the data into the files:
ROOT# echo 'www:x:888:888:Web Account:/webhome:/usr/bin/False' > etc/passwd
ROOT# echo 'www:x:888:' > etc/group
ROOT# echo 'www:*:10882:-1:99999:-1:-1:-1:134537804' > etc/shadow
I have given this user no login, and no shell. Just to be complete, compile a 'no-go' shell called False:
ROOT# echo 'int main(int argc, char *argv[]) { return(1); }' > /tmp/False.c
ROOT# cc -o /www/usr/bin/False /tmp/False.c
While we are at it, lets mark the binaries as execute-only:
ROOT# chmod 111 usr/bin/*
Some naming services will be required. With glibc and the Name Service Switch libraries the necessary libraries are not immediately obvious. See 'man nsswitch' for details.
I chose to rely on files and DNS, even though I also run NIS on my home machines.
Note: The libresolv library will be needed as well (This will become evident when PHP is installed).
ROOT# cp -pi /lib/libnss_files.so.2 lib/
ROOT# cp -pi /lib/libnss_dns.so.2 lib/
We will need 3 files to complete the configuration for Naming Service. The contents of these files will depend on your IP and DNS setup. Here we assume that the web server is named ns.mynet.home with IP address 192.168.196.2 (it is actually also my naming server):
# ---- Contents of    etc/nsswitch.conf ----#
passwd: files shadow: files group: files hosts: files dns
# ---- Contents of    etc/resolv.conf ----#
domain mynet.home ## use the IP address of your naming server ## if bind is not installed on your web server #nameserver 192.168.196.xxx ## use this if your web server is a (caching) name server nameserver 127.0.0.1
# ---- Contents of    etc/hosts ----#
127.0.0.1 localhost loopback 192.168.196.2 ns.mynet.home ns www

Compile and Install Apache

Make the top-level directory for the apache install (in this example: /apache); and create a symlink to it in the real tree:
ROOT# mkdir /www/apache
ROOT# ln -s /www/apache /apache
I normally compile and install as an ordinary user (in this example: softs), rather than as 'root'. Note, however, that the installation of apache should be done as root. (See the 'security tips' in the online apache documentation)
In this case I compile sources in /usr/local/src/chr, which is owned by softs:softs
$ cd /usr/local/src/chr
$ tar zxf /path/to/apache_1.3.12.tar.gz
$ cd apache_1.3.12

Edit config.layout so that it includes a special layout called chroot

#   chroot layout.<Layout chroot>   prefix:        /apache   exec_prefix:   $prefix   bindir:        $exec_prefix/bin   sbindir:       $exec_prefix/bin   libexecdir:    $exec_prefix/libexec   mandir:        $prefix/man   sysconfdir:    $prefix/conf   datadir:       $prefix   iconsdir:      $datadir/icons   htdocsdir:     $datadir/htdocs   cgidir:        $datadir/cgi-bin   includedir:    $prefix/include   localstatedir: $prefix/var   runtimedir:    $localstatedir/logs   logfiledir:    $localstatedir/logs   proxycachedir: $localstatedir/proxy</Layout>

Now configure and make:
- non-DSO:
  $ ./configure --with-layout=chroot \ --enable-module=most --enable-module=so
  By enabling the module 'so' you have the possibility of extending your Apache installation later via 3rd-party modules through the DSO+APXS mechanism.
- DSO:
  $ ./configure --with-layout=chroot \ --enable-module=most --enable-shared=max
$ make
ROOT# make install ## I am root!
Copy the other shared libraries that will be needed by Apache as configured in this example. NOTE that other configurations may require other libraries (use ldd to find out).
ROOT# cd /www
ROOT# cp -pi /lib/libm.so.6 /lib/libcrypt.so.1 /lib/libdb.so.3 lib/
ROOT# cp -pi /lib/libdl.so.2 lib/
Do a quick test to see that it worked. The main fields to edit in the configuration file /www//apache/conf/httpd.conf for a quick test are:
User www
Group www
ServerName yourserver.yourdomain.here
Port 8088 ## pick your favourite test port
Here are sample configuration files:
non-DSO: httpd.conf (colorized) httpd.conf (plain text)
DSO: httpd.conf (colorized) httpd.conf (plain text)
Start the daemon (you need to be root):
ROOT# chroot /www /apache/bin/apachectl start
Test the URL:
$ lynx -dump http://yourserver/
Test the URL if on another port, eg: 8088:
$ lynx -dump http://yourserver:8088/
Here is a small perl script that removes most of the comments from the generated config files, for those who want to simplify the file.
This would be a good time to give ownership of the htdocs tree to the web tree 'owner':
ROOT# chown -R 888:888 /www/apache/htdocs

Compile and Install MySQL

MySQL is not installed in the chroot-ed tree; indeed, it should probably be installed on another system. But on my home system it is installed on the same server as apache.

This example includes creating the user and the place where the database will reside, and the creation of the initial database.

Create the user who will own the mysql database -- for example, 777:777 in /home/mysql:
ROOT# groupadd -g 777 mysqldba
ROOT# useradd -c "mysql DBA" -d /home/mysql -u 777 -g 777 -m -n mysql
unpack the source and give ownership of the mysql source tree to the mysql user:
ROOT# mkdir /usr/local/mysql
ROOT# chown mysql:mysqldba /usr/local/mysql
ROOT# cd /usr/local/src
ROOT# tar zxf /path/to/mysql-3.22.27.tar.gz
ROOT# chown -R mysql:mysqldba /usr/local/src/mysql-3.22.27
Now as the mysql user, make a directory for the database, and compile and install mysql:
$ mkdir ~/db ## where the DB will reside
$ cd /usr/local/src/mysql-3.22.27
$ ./configure --localstatedir=/home/mysql/db --prefix=/usr/local/mysql
$ make
$ make install
Create the *MySQL* grant tables (necessary only if you haven't installed *MySQL* before):
$ ./scripts/mysql_install_db
Install and modify the database startup script, changing the database owner from root to 'mysql':
ROOT# cd /usr/local/src/mysql-3.22.27/
ROOT# cp support-files/mysql.server /etc/rc.d/init.d/
ROOT# chmod 755 /etc/rc.d/init.d/mysql.server
ROOT# [ edit /etc/rc.d/init.d/mysql.server: ]
```
mysql_daemon_user=mysql ## so we can run mysqld as this user.
```
ROOT# chkconfig --add mysql.server ## permanently add server to rc scripts
It may be necessary to refresh the shared library cache after installing mysql:
ROOT# /sbin/ldconfig -nv /usr/local/lib
Edit the PATH variable for the mysql owner, and set up the 'root' password for the database (read the documentation!) (and you will probably want to delete the test database and associated entries):
$ [ Edit shell login script .bash_profile: ]
```
     PATH=$PATH:$HOME/bin:/usr/local/mysql/bin
```
$ . ~/.bash_profile ## source it!
$ mysqladmin -u root password '2mUch!data' ## pick your own password!

Compile and Install PHP

Stop the apache daemon, if it is running:
ROOT# chroot /www /apache/bin/apachectl stop
You must first compile PHP, then for non-DSO installs only you must recompile Apache. (You will need to do this each time you upgrade either software package for non-DSO installs.)
$ cd /usr/local/src/chr ## I am NOT root!
$ tar zxf /path/to/php-4.02.tar.gz
$ cd php-4.02
- non-DSO:
  $ ./configure --with-mysql=/usr/local/mysql \ --with-apache=../apache_1.3.12 --enable-track-vars \ --with-config-file-path=/apache/conf --sharedstatedir=/tmp
- DSO:
  $ ./configure --with-mysql=/usr/local/mysql \ --with-apxs=/apache/bin/apxs --enable-track-vars \ --with-config-file-path=/apache/conf --sharedstatedir=/tmp
- DSO:
  (or add CFLAGS switch when mod_ssl was also configured as a DSO module)
  $ CFLAGS=-DEAPI ./configure --with-mysql=/usr/local/mysql \ --with-apxs=/apache/bin/apxs --enable-track-vars \ --with-config-file-path=/apache/conf --sharedstatedir=/tmp
$ make
- non-DSO:
  $ make install
- DSO:
  ROOT# make install
  (You will need to be root for the DSO 'make install' of PHP, since the module goes directly into the module tree: /apache/libexec/ and additionally the apache configuration file is altered.)
Now for the non-DSO installation only, recompile Apache, activating the PHP module:
$ cd ../apache_1.3.12/
$ ./configure --with-layout=chroot \ --enable-module=most --enable-module=so \ --activate-module=src/modules/php4/libphp4.a
$ make
ROOT# make install ## I am root!
More shared libraries (for PHP) are needed in the chrooted tree; check with 'ldd':
- For non-DSO: ldd /apache/bin/httpd
- For DSO: ldd /apache/apache/libexec/libphp4.so
A little for-loop can be used to copy the needed files from /lib and from /usr/lib:
ROOT# cd /www
ROOT# for i in libresolv.so.2 libnsl.so.1 libpam.so.0 ; do > cp -pi /lib/$i /www/lib/ ; done
ROOT# for i in libgd.so.1 libgdbm.so.2 libz.so.1; do > cp -pi /usr/lib/$i /www/usr/lib/ ; done
If you will be needing mysql, you must install that library as well from the place it was compiled into:
ROOT# cp -pi /usr/local/mysql/lib/mysql/libmysqlclient.so.6 /www/usr/lib/
You must edit httpd.conf so that it recognizes .php files, if you have not already done so:
ROOT# cd /apache/conf
ROOT# [ edit /apache/conf/httpd.conf ]
AddType application/x-httpd-php .php AddType application/x-httpd-php-source .phps
Restart the daemon:
ROOT# chroot /www /apache/bin/apachectl start
For non-DSO installs you can check for compiled-in PHP:
ROOT# chroot /www /apache/bin/httpd -l | grep php
mod_php4.c
Here is a 'hello world' script to test PHP. It should be installed as 'hello.php' (save as type 'text' from netscape), with a copy or a symlink as 'hello.phps' for source-code viewing if you wish. Do not leave it lying around for the public after you have finished testing!

Compile and Install Perl

You can get away with simply copying /usr/lib/perl5 into /www/usr/lib/ ; and copying /usr/bin/perl5.00503 (assuming Red Hat 6.0) into /www/usr/bin/. You would need to check for, and install any missing shared libraries. You should also make a hard link from usr/bin/perl5.00503 to usr/bin/perl in /www as well.

The easy way:

ROOT# cp -a /usr/lib/perl5 /www/usr/lib/perl
ROOT# cp -p /usr/bin/perl5.00503 /www/usr/bin/
ROOT# cd /www/usr/bin
ROOT# ln perl5.00503 perl

Nonetheless (the slightly less easy way :o), I show how to compile and install perl. If you are going to install mod_perl then you really must compile perl here:

Make the nessary links to install into the chroot-ed tree. This example uses /usr/Local inside the tree. The choice of /usr/Local is deliberate -- do not confuse it with /usr/local.
Ever fearful :-O, I installed as 'softs':
ROOT# mkdir /www/usr/Local
ROOT# ln -s /www/usr/Local /usr/Local
ROOT# chown softs:softs /www/usr/Local
Get the source RPM from the redhat sources:
ROOT# rpm -i /path/to/perl-5.00503-2.src.rpm
As the owner (softs) of the source tree, unpack perl.
$ cd /usr/local/src/chr
$ tar zxf /usr/src/redhat/SOURCES/perl5.005_03.tar.gz
Red Hat supplies some patches in the SRPM. You can apply these patches for this distribution as well. This exemple illustrates patching perl from a Red Hat 6.0 distribution.
$ cp /usr/src/redhat/SOURCES/perl*.patch .
$ cd perl5.005_03
$ patch -p1 <../perl5-installman.patch
$ patch -p1 <../perl5.005_02-buildsys.patch
$ patch -p1 <../perl5.005_03-db1.patch
You need to run Configure, accepting most of the defaults from the script. You will also probably want to specify 'none' for the man pages. A few of the defaults to change in my example are listed below:
$ ./Configure
- architecture name? i386-linux
- Installation prefix to use? /usr/Local
- Directories to use for library searches? /lib /usr/lib /usr/Local/lib
- install perl as /usr/bin/perl? n
Compile and install it.
$ make
$ make test
$ make install
Create symlink to perl in the usr/bin tree. If you are not installing mod_perl (see ahead), then you could change ownership of the perl tree to root (but not necessary as long as the permissions on the perl tree are read-only for the web-tree owner: uid '888' in this example):
ROOT# cd /www/usr/bin
ROOT# ln -s ../Local/bin/perl perl

Check the shared libraries and install any missing libraries (depends on your configuration options). No extra libraries are needed in this example:
ROOT# ldd /www/usr/bin/perl

  libnsl.so.1 => /lib/libnsl.so.1 (0x4001b000)  libdl.so.2 => /lib/libdl.so.2 (0x40031000)  libm.so.6 => /lib/libm.so.6 (0x40035000)  libc.so.6 => /lib/libc.so.6 (0x40052000)  libcrypt.so.1 => /lib/libcrypt.so.1 (0x40147000)  /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

Test your installation:
ROOT# chroot /www /usr/bin/perl -v
```
This is perl, version 5.005_03 built for i386-linux...
```
Set up the example perl cgi bin script installed with the apache server:
ROOT# cd /www/apache/cgi-bin
ROOT# chmod ugo+x *
Start your apache server, and test the example perl cgi bin script installed with the server:
ROOT# chroot /www /apache/bin/apachectl start
$ lynx -dump http://yourserver/cgi-bin/printenv
While you are at it, check test-cgi for shell CGI access as well:
$ lynx -dump http://yourserver/cgi-bin/test-cgi
Finally, REMOVE the execute bit from the example cgi-scripts, or remove them entirely. Don't let the general public have access to these scripts:
ROOT# chmod ugo-x /www/apache/cgi-bin/*

Compile and Install mod_perl

This process assumes that your chroot-ed perl tree is owned by your source-code owner. It this is not the case then change ownership:
ROOT# chown -R softs:softs /www/usr/Local
Extract the source code for mod_perl:
$ cd /usr/local/src/chr
$ tar zxf /path/to/mod_perl-1.24.tar.gz
$ cd mod_perl-1.24
IMPORTANT: Put /usr/Local/bin first in the path before starting configuration. This helps to avoid problems with the configuration finding /usr/bin/perl or /usr/local/bin/perl first in your environment:
$ which perl /usr/bin/perl
$ export PATH=/usr/Local/bin:$PATH ## assuming a bourne shell
$ which perl /usr/Local/bin/perl
Now configure mod_perl; configure the Perl-side of mod_perl, and prepare the mod_perl subdirectory inside apache:
- non-DSO:
  $ perl Makefile.PL APACHE_SRC=../apache_1.3.12/src \ DO_HTTPD=1 USE_APACI=1 PREP_HTTPD=1 EVERYTHING=1
- DSO:
  $ perl Makefile.PL USE_APXS=/apache/bin/apxs \ EVERYTHING=1
- DSO:
  ( or add the CFLAGS switch when mod_ssl was also configured as a DSO module [and respect any message about apache include files being in unusal places!] )
  $ CFLAGS=-DEAPI perl Makefile.PL USE_APXS=/apache/bin/apxs \ EVERYTHING=1
Now make and install mod_perl into the chroot-ed tree:
$ pwd /usr/local/src/chr/mod_perl-1.24
$ make
- non-DSO:
  $ make install
- DSO:
  ROOT# make install
  (You will need to be root for the DSO 'make install' of mod_perl, since the module goes directly into the module tree: /apache/libexec/ and additionally the apache configuration file is altered.)
For the non-DSO installs only you must recompile apache and reinstall it:
$ cd /usr/local/src/chr/apache_1.3.12/
$ ./configure --with-layout=chroot \ --enable-module=most --enable-module=so \ --activate-module=src/modules/php4/libphp4.a \ --activate-module=src/modules/perl/libperl.a
$ make
Stop apache if it was previously running, and install it:
ROOT# chroot /www /apache/bin/apachectl stop
ROOT# make install ## I am root!
For non-DSO installs you can check for compiled-in PHP and mod_perl:
ROOT# chroot /www /apache/bin/httpd -l | grep -E '(php|perl)' mod_php4.c mod_perl.c
Test your mod_perl setup with the Hello.pm. perl module written by Doug MacEachern. You need to install it, edit httpd.conf and restart apache:
$ cp -i Hello.pm \ /www/usr/Local/lib/perl5/site_perl/5.005/i386-linux/Apache/

Insert the mod_perl configuration necessary for Hello.pm into httpd.conf -- example:
```
<Location /hello>  SetHandler perl-script   PerlHandler Apache::Hello</Location> ### Section 3: Virtual Hosts
```
ROOT# chroot /www /apache/bin/apachectl restart
$ lynx -dump http://yourserver:yourPort/hello
```
   Hello, I see you see me with Lynx/2.8.3dev.18 libwww-FM/2.14.
```
If you are finished testing then comment out this entry in httpd.conf. Don't leave unnecessary functionality enabled in your apache configuration.
NOTE: If you need to install other perl modules, you must put /usr/Local/bin first in your path before running perl Makefile.PL:
$ export PATH=/usr/Local/bin:$PATH ## assuming a bourne shell

Compile and Install mod_ssl

I hope you have read the additional notes section if you are planning a DSO install of mod_ssl.

You must compile both openSSL and mod_ssl. I have also elected to compile rsaref version 2.0. You should read the documentation for mod_ssl in the source code table to understand the issues and the options for mod_ssl.

Note that openssl and rsaref provide the include files, libraries and tools to allow you to compile mod_ssl and generate keys, and therefore they are not part of, nor installed into, the chroot-ed tree.

Extract the source code for mod_ssl, openssl and rsaref20:
$ cd /usr/local/src/chr
$ tar zxf /path/to/mod_ssl-2.6.6-1.3.12.tar.gz
$ tar zxf /path/to/openssl-0.9.5a.tar.gz
$ mkdir rsaref-2.0
$ cd rsaref-2.0
$ tar Zxf /path/to/rsaref20.1996.tar.Z
Configure and build the RSA Reference library. Note that on 64-bit architectures you MUST read the documentation in the INSTALL file in the mod_ssl package about portability problems/solutions with rsaref.
$ cd /usr/local/src/chr/rsaref-2.0
$ cp -rpi install/unix local
$ cd local
$ make
$ mv rsaref.a librsaref.a
Configure and build the OpenSSL library.
$ cd /usr/local/src/chr/openssl-0.9.5a
$ ./config -L/usr/local/src/chr/rsaref-2.0/local -fPIC
$ make
$ make test # inspect output for anomolies
You may want to install the package. Of course, it is not installed in the chroot-ed tree. Here I assume that softs:softs owns the /usr/local/ tree, because the default install prefix for openssl is /usr/local/ssl. However, it is not necessary to install this package; you can operate out of the src tree for building mod_ssl (but do NOT run make clean!)
$ make install
Configure mod_ssl:
$ cd /usr/local/src/chr/mod_ssl-2.6.6-1.3.12
$ ./configure --with-apache=../apache_1.3.12
Go into the apache tree to complete the build. Run configure and then make:
$ cd /usr/local/src/chr/apache_1.3.12
- non-DSO:
  $ SSL_BASE=../openssl-0.9.5a RSA_BASE=../rsaref-2.0/local \ ./configure --prefix=/apache --with-layout=chroot \ --enable-module=most --enable-module=so --enable-module=ssl \ --disable-rule=SSL_COMPAT --enable-rule=SSL_SDBM \ --activate-module=src/modules/php4/libphp4.a \ --activate-module=src/modules/perl/libperl.a
- DSO:
  $ cd src/modules
  $ make clean ## seems to be necessary if you previously compiled in the apache tree
  $ cd ../../
  $ SSL_BASE=../openssl-0.9.5a RSA_BASE=../rsaref-2.0/local \ ./configure --prefix=/apache --with-layout=chroot \ --enable-module=most --enable-shared=max --enable-shared=ssl \ --disable-rule=SSL_COMPAT --enable-rule=SSL_SDBM
$ make
Install apache again. Stop apache if it was previously running, and then install it:
ROOT# chroot /www /apache/bin/apachectl stop
ROOT# make install ## I am root!
For non-DSO installs you can check for the compiled-in modules:
ROOT# chroot /www /apache/bin/httpd -l | grep -E '(php|perl|ssl)' mod_ssl.c mod_php4.c mod_perl.c
Create the random devices in the chroot-ed tree:
ROOT# cd /www/dev
ROOT# mknod random c 1 8
ROOT# mknod urandom c 1 9
Merge the default configuration file into your current httpd.conf file. I test on a different port from the standard port 80 because I usually already have a web server running on port 80. But for the secure port (port 443) I have no web server running, so I use it immediately.

For the default configuration file the main fields to change for this example follow (and here is an example httpd.conf file):
- User www
- Group www
- ServerName yourserver.yourdomain.here
- Port 8088 ## pick a test port
- Listen 8088 ## in 'IfDefine SSL' section
- Listen 443 ## this is the standard secure port!
- <VirtualHost _default_:443>
- AddType application/x-httpd-php .php AddType application/x-httpd-php-source .phps
- # your Hello.pm script for mod_perl testing:
  <Location /hello> SetHandler perl-script PerlHandler Apache::Hello </Location>
- SSLCertificateFile /apache/conf/server.crt
- SSLCertificateKeyFile /apache/conf/server.key
  # in this example I generate the key and crt files into /apache/conf
If you do not already have a server key and certificate then create them. In this example I assume that openssl is in your path because you have installed it. If not, then you should add it to your path (according to this example it is in /usr/local/src/chr/openssl-0.9.5a/apps).

Note also that I am certifying my own key. Presumably you will get a Certifying Authority to sign your key if you are doing any serious work with your web tree (eg: commericial work):
- ROOT# cd /www/apache/conf
- # set up a path of random files:
  ROOT# randfiles='/var/log/messages:/proc/net/unix:/proc/stat:/proc/ksyms'
- # generate the server key
  ROOT# openssl genrsa -rand $randfiles -out server.key 1024
- # generate the signing request (don't add a password when certifying it yourself).
  Note that it is important that the Common Name match your fully-qualified web server name!
  ROOT# openssl req -new -nodes -out request.pem -key server.key
- # sign your own key (validity for one year in this example):
  ROOT# openssl x509 -in request.pem -out server.crt -req \ -signkey server.key -days 365
- Protect your key and certificate:
  ROOT# chmod 400 server.*
- Delete the request file:
  ROOT# rm request.pem
- Optionally encrypt your key (you will have to provide the password each time your start apache, but you may have a very good reason for doing this!):
  ROOT# mv server.key server.key.unencrypted
  ROOT# openssl rsa -des3 -in server.key.unencrypted -out server.key
  ROOT# chmod 000 server.key.unencrypted ## better yet delete it!
- Oops, you changed your mind. You decide to remove the encryption password from your key:
  ROOT# openssl rsa -in server.key -out server.key.un
  ROOT# mv server.key.un server.key
  ROOT# chmod 400 server.key
Start apache first without ssl to make sure it still works:
ROOT# chroot /www /apache/bin/apachectl start
$ lynx -dump http://yourserver:8088/
Re-start apache with ssl and test it with netscape (note that the URL is specified as https):
ROOT# chroot /www /apache/bin/apachectl stop
ROOT# chroot /www /apache/bin/apachectl startssl
$ netscape https://yourserver/
At this point you probably want to edit your web server configuration and set the server on the standard ports 80 and 443 if your test configuration did not use these ports.

Some Security Considerations

See the 'security tips' in the online apache documentation for some help in this area. One extra precaution to take is to change the permissions on the httpd scripts and binaries:
ROOT# chmod ugo-rw /www/apache/bin/*

Escaping Your Chroot-ed Environment

You should be very careful when you set about deliberately escaping from your chroot-ed environment -- you need to assess the risks and benefits. In the UNIX world there is always more than one way to accomplish a task, and you should think about other ways of solving your problem.

Never the less, I provide an example C-utility that implements a customised remote-shell command to email a file generated by forms output to someone. It might be invoked via a cgi-bin script, or via PHP. Example:
```
      <?php      ...      /** construct the file name as $f  **/      $cmd = "/bin/mail \"-s Some-subject-line -t webmaster@localhost -f $f\"";      $op = exec( $cmd, $arr, $retval );      ...      ?>   
```
The file is called wwwmail.c and it is linked here.

Almost anything can be done this way if you code-up your own small utility. I have written a similar utility to exec sqlplus macros for example. But these sorts of utilities are risky, and need to be carefully evaluated.

For this particular problem (emailing forms-output to someone) you might be better off putting files to be mailed in a directory and having a cron job pass through every few minutes and process them...

Cleanup, etc. After Installation

Remove temporary links needed for installation of apache and perl ( remember to put them back if you need to rebuild or upgrade any package ):
ROOT# rm /apache /usr/Local
Automate the startup of Apache by installing a startup script called httpd in /etc/rc.d/init.d/ Here are two examples:
Standard appache on port 80
Apache on ports 80 and 443 (startssl)

Then run chkconfig on it to set the runlevel symbolic links (and verify it with '--list'):
ROOT# chkconfig --add httpd
ROOT# chkconfig --list httpd

httpd       0:off   1:off   2:on    3:on    4:on    5:on    6:off

Automate log file trimming. On a Red Hat system you can specify which log files and which parameters to use in /etc/logrotate.conf. Here is an example file

New: Harvesting files based on RPMs

I haven't had time to totally document this, but you can use the RPM contents from RPMs and create a chroot-ed web tree without compiling the sources. To this end, I have two scripts. I will document this technique more completely later...

Script file based on Red Hat 7.0 that will harvest the RPMs

Script file for creating temporary SSL key and certificate (testing purposes only!!!)